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FOREWORD 



Earlier efforts by LOGICON to develop a real-time connected speech recog 
nition system resulted in a system for using hardware designed for isolated 
word recognition (IWR) but enhanced with connected speech recognition soft- 
ware* This LISTEN system was reported in a series of technical reports 
referenced herein. 

The effort reported here has developed two products to enhance the use 
of the concept of using high quality acoustical hardware^ such as used for 
IWR, in conjunction with sophisticated software for connected speech recog- 
nition. One product is a set of software for formation of voice reference 
patterns. The second product is a users* manual, included as an appendix 
here, which details the techniques required to form reliable reference data. 




R. BHEAUX, Ph.D. 
Scientific Officer 
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SECTION I 
INTRODUCTION 



PURPOSE 

This report documents the work accoinplished and results obtained during 
the Voice Interactive Analysis System (VIAS) study pro j act • 

BACKGROUND 

The VIAS study was undertaken as part of a contii^uing effort to obtain a 
capability for automatic recognition of connected speech which meets the re-* 
quirements of the Naval Training Equipment Center (NAVTRAEQUIPCEN) for appli-- 
cation in training systems. It is the natural outgrowth of previous projects 
which led to the development of Logicon^s Initial System for the Timely 
Extraction of Nxambers (LISTEN) , a minicomputer based # real-time connected 
speech recognition system. 

Projects which led to the development of LISTEN in December of 1977 did 
not include extensive testing of that system, with the result that at their 
conclusion the potential of LISTEN to support naval training applications was 
not unambiguously demonstrated* Good speech recognition accuracy had been ob- 
tained for one speaker (MWG) , and poor but ambiguous test results were obtained 
for another speaker (BRO) > apparently due to equipment problems or anomalous 
changes in the second speaker *s voice. 

At tne termination of LISTEN •s development it was also very difficult to 
generate the voice reference data needed to use the system with a new speaker 
or a new vocabulary. LISTEN relies heavily on processing a large saxcq:>le of 
voice data in order to produce a laurge amount of structural and statistical 
data descriptive* of the speaker *s voice, with these data in a form suitable to 
support real-time connected speech recognition. The voice sample processing 
programs left after developing LISTEN were mostly dual purpose programs, 
serving to support both research into the nature of the voice data, and the 
extraction of voice parameters once those characteristics with promise for 
recognition had been identified. The processes used included minicomputer pro- 
grams^ programmable calculator procedures, manual graphing and manual calcu- 
lations ♦ Upwards of forty hours of both minicomputer and manual data proc- 
essing were required to develop the voice reference data. 

The VIAS study was thus undertaken with two main purposes: to further 
test and analyze LISTEN'S performance, and to bring together the collection of 
voice reference data generation procedures into a coherent set of computer pro- 
grams which could be delivered to the government. The additional test and 
analysis of LISTEN was to be based upon a set of computer programs for auto- 
matically class: fying and gathering performance data. Two auxiliary goals were 
also attached to the project. First of these was to transfer LISTEN technology 
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from the speech preprocessor (feature extractor) with which it was originally 
developed to a newer version of that device, as the previous model has gone 
out of production* Second was the extension upward of LISTEN* s initial vocab- 
ulary BiZB of eleven words, as far as could easily be managed without major 
software modification, toward thirty words* 

REPORT 0VERVIE;JJ^ 

Five groups of tasks were identified in the VIAS Project Work Plan Report, 
appropriate to the project goals just described* The relationship of each 
individual task to this report is described be low ♦ 

TASK GROUP 1 — TECHNOLOGY TRANSFER, This group included four tasks address- 
ing the problem of transterring LISTEN technology from the Threshold Tech- 
nology Model VlP-100 speech preprocessor to its replacement, Model TTI-500* 
Tasks la and lb entailed gathering speech data for a single speaker, and using 
the previously developed computer program GZEC to discover structure in those 
data. These tasks are not described in detail, as their purpose was to pro- 
vide the data for tasks ic and Id, Task Ic was a major analytic task, direct- 
ed toward detennininq which acoustic features extracted by the TTI-500 
preprocessor are most useful for recognition* This analysis is described 
extensively in Section IV* Task Id, directed toward verification of the fea- 
ture selection, is also reported in that section* 

TASK GROUP 2 DEVELOP THE VOICE DATA GENERATION SYSTEM (VDGS) . The fotir 
tasks in this group brought together the various procedures used for generating 
voice reference data to support real-time speech recognition by LISTEN, in the 
form of a unified body of computer programs and a user's manual. Tasks 2a, 
2b and 2c entailed programming tasks and are not reported upon further. Their 
end result is the VDGS, a set of computer programs constituting a separate 
deliverable of this project. The fourth task, 2d, was to produce a users 
guide for the VDGS, which is introduced in Section II of this report, and in- 
cluded in its entirety as Appendix A. 

TASK GROUP 3 — EXPAND VOCABULARY. The single task in this group was executed 
in conjunction with tasks 2a, 2b and 2c. It entailed increasing the maximum 
number of vocabulary items which can be accommodated by the individual pro- 
grams of the 7DGS, when practicable, toward thirty words. Results obtained 
are discussed in Section II, in connection with the VDGS. 

TASK GROUP 4 - DEVELOP PERFORMANCE ANALYSIS SUBSYSTEM (PASS) . The four tasks 
in this group entailed the design, implementation and application of a r w set 
of computer programs for collecting and organizing data about LISTEN'S recog- 
nition performance. Also, the initial task, 4a, was directed toward converting 
the real-time recognition components of LISTEN (the programs LTRGEN, MEX and 
MINT) to opeate in a new computer (Data General S-130) and speech preprocessor 
IT'PI-SOO) ^environment. The programming tasks, 4a, 4b and 4c are not described 
further, as their end result is the set of programs comprising PASS, a separate 
deliverable. The programs are, however, introduced in Section III of this 
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ceport« and a Users Guide for those programs is included as Appendix B. Task 
4d, the application of some elements of the PASS to automatically classify rec- 
ognition errors conraitted by LISTEN, is described 'in Section IV. 

TASK GROUP 5 - CRITICALLY EXAMINE INFORMATION SOURCE MODELS . The four tasks 
in this group were directed toward a detailed examination of the strengths and 
weaknesses of LISTEN, by determining the relative importance of the various 
information sources used in that system to achieve recognition* As these were 
all analytical tasks, they are discussed extensively in Section IV. 

KNOWLEDGE 01 LISTEN ASSUMED. As LISTEN is a conplex system, based on some 
unique approaches to obtaining automatic recognition of connected speech, this 
report would become excessively long if the principles and details of operation 
of LISTEN were described in a self-sufficient way here. The remainder of this 
report is therefore written assuming the reader has an understanding of LISTEN, 
to a level of detail easily accessible in the final reports of the projects 
which led to its development. For convenience, these reports are identified 
below. 

a. Use of Computer Speech Understanding in Training; A Preliminary In- 
vestigation of a Limited Continuous Speech Recognition Capability; Technical 
Report NAVTRAEQUIPCEN 74-C-0048-2; Logicon, Inc.; June 1977. 

b. LISTEN; A System for Recognizing Connected Speech Over Small, Fixed 
V ocabularies, In Real TimeT ~Report NAVTRAEQUIPCEN 77-C-O 096-1; Logicon, Inc.; 
April, 1978. 
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SECTION II 
THE VOICE DATA GENERATION SYSTEM (VDGS) 



DESCRIPTION 

Th« VDGS consists of a collection of computer programs for collecting and 
processing voice data to generate the voice reference data necessary to recog- 
nise connected speech in real time with LISTEN. The end product of these pro- 
grami i« a large set of data in the format of a standard Data General data file, 
called the MIND file* 

The twenty-nine programs comprising the VDGS are written in FORTRAN IV, 
FORTRAN V and Data General Assembly Languages. The programs are capable of, and 
intended for, use on a Data Gen'iral S-130 minicomputer eouipped with at least 
32K words of memory, a 10-megabyte disc, the RDOS operating system and standard 
peripherals. They may, however, be recompiled for execution on other Data 
General minicomputers, such as the Nova 3. 

I 

Appendix A is a Users Manual for the VDGS. It contains instructions where- 
by a qualified speech research technician, fauniliar with the principles and 
details of LISTEN'S operation, can collect speech data (given the necessary 
equipment) and produce a MIND file for use with LISTEN. 

The VDGS contains all programs necessary to collect speech data and pro- 
duce a MIND file. Df the twenty-nine programs, twenty-four must be used in this 
process. The remaining five programs are often useful, but in general are not 
needed to produce MIND files. All manaual and extra-computer procedures 
required to generate voice reference data prior to this project have been auto- 
mated and implemented as programs in t) e VDGS. However, human surveillance and 
occasional modification of the generated data are essential if the recognition 
performance of LISTEN is to be optimized. The VDGS therefore exists in two 
formst as a collection of independent programs for individual execution, and as 
a "pushbutton** system recfuiring a minimum amount of human intervention, known as 
CHAINMIND. 

CHAINMIND consists of three segments} EXTRACT, GENTL and MAKjSMxNO. EXTRACT 
is a program which facilitiates gathering speech data samples in a format suit- 
able for use by the remainderof VDGS. It includes prompting of the speaker via 
the CRT display, with utterance contents taken from files provided. Since the 
voice data are usually taken over several separate recording sessions, (perhaps 
over several days), there is a natural division of the MIND file generation 
process at that point where all necessary speech data have been recorded on 
disk, and the voice data processing can begin. 

Another reason the EXTRACT process is kept separate from the remainder of 
tne CHAINMIND version of VDGS is that a decision must be made at that point with 
regard to separating the collected voice samples (which consist mostly of 
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Miultiple inord utterances) into indiviaual vocabulary items. ;rhis can be done 
either automatically or manually. However, initial results ^sing the auto* 
matically generated individual vocabulv ry examples indicates that reasonable 
recognition results cannot be obtained in this way. (See Section IV for spe- 
cifics about Example Set generation.) The User's Manual' and the VDGS contain 
instructions and aids for generating vocabulary xtom okanqsles manually. 

The second segment of the CHAINMIND version of the VDGS, GENTI,, consists 
of programs which culminate in the generation of Transition Letter Sets for 
each vocabulary item. Although this process needs no human intervention, the . 
Transition Letter Sets are so fundamental to the successful operation of LISTEN 
that prudence dictates that they should be examined and in some cases modified 
before continuing the MIND file generation process. This is particularly true 
since the method used to generate Transition Letter Sets (the algorithm 
GENRLIZ in the program GZEC) is l>.euristic in nature and subject to the influ- 
ence of extraneous details, such as the order in which speech samples are pre- 
sented to it. 

The third segment of CHAINMIND contains the majority of the programs and 
requires the majority of processing time. Here too, prudence dictates human 
surveillance of every step of the process if LISTEN 's performance is to be op- 
timized. Critical points at which intervention may be required cannot be 
identified at this tiwe, as only a small number of speakers* data have been 
processed. The Users Manual contains some suggestions and remarks which may 
be helpful in identifying anomalies. 

For reasons mentioned above, it is recommended that the VDGS be used as 
a collection of individual program elements in accordance with the Users 
Manual, with careful scrutiny of results at every step. 

VOCABULARY SIZE 

LISTEN was initially developed for an eleven-word vocabulary, and many 
of the progreuns now included in the VDGS were developed to operate with about 
that many vocabulary items. Under Task 3a, the vocabulary capacity of many of 
these programs has been extended toward thirty, atid program? developed during 
this project for inclusion in the VDGS have, as far as possible, been con- 
structed to accommodate the larger vocabulary. 



The tabulatior oelow gives the program name and the vocabulary size 
capability of the individual programs In the VDGS, as delivered. 



EXTRACT 


any 


INVERT 


30 


MUITS 


30 


ESG 




CROAK 


15 


GLO\^ 


30 


GZEC 


13 


REVEX 


13 


TAILOR 


30 


RESCUE 


30 


ADDER 


13 


BUILDER 


any 


SIGH 


30 


AVRAJ 


13 


DEAI,ER 


13 


LOOPER 


13 


CRAP 


13 


PHEW 


30 


REVEXA 


13 


GAPSTER 


13 


GASP 


15 


RVOIT 


13 


SORTA 


any 


ESDIT 


11 


COVERT 


15 


SORTS 


any 


ESGDIT 


13 
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Bxt^Ading thm vocabulary siza capability of the programs in VDGS which 

are not yet capable of handling thirty vocabulary items would require more 
or Idas extensive modification of those programs. Doing so during this proj- 
ect was judged an inappropriate use of project resoxirces, as discussed in the 
Work Plan Report. In this connection/ it should be noted that two of the 
uhree programs needed for real-time recognition (MBX and MINT) are limited to 
thirteen vocabulary items, and mod; cation of one of them (MEX) to accommo- 
date a larger vocabulary would re<fulre several labor months of effort* The 
difficulty of such an extension stems from the complexity of the MEX data 
structure, not from any inh^yent limitation of the recognition algorithm* 
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SECTION III 
THE PERFORMANCE ANALYSIS SUBSYSTEM (PASS) 

The PASS consists of three computer programs for collecting, processing 
and plotting data useful in analyzing many aspects of LISTEN'S performance. 
These programs are supported by a special version of LISTEN in which the MEX 
program produces data files in the format required for processing by the PASS 
program BIGMINT. The other two PASS programs (STATSUM and LICVAT) operate on 
data provided by BIGMINT • 

The programs in the PASS operate in the same environment as LISTEN and the 

VDGS. 

Appendix B is a Users Manual for the PASS, It contains instructions for 
using these programs for extracting and processing data from files produced by 
LISTEN. Section IV of this report describes several analytical investigations 
which were based on data derived and processed by the PASS. Those analyses 
are thus examples of the varied possible uses of data generated by the PASS. 

Specific information elements developed by programs in PASS are discussed 
below. 

BIGMINT. This program provides the following data: 

a» An annotated listing of the entire MIND file. 

b. Date and identification of the MEX--generated file used to produce the 
lollowing data items for each utterance in that file. 

c. Compressed speech data file identifier. 

d. Potential recognitions detected by MEX> with the following data for 
each potential recognition. 

K 1 ) Machine type 

(2) Vocabulary item 

(3) T-state counter statistic QT 
{4) L-state counter statistic QL 

(5) Start Lime 

(6) Recognition time 

(7) Associated vocabulary items and forms 
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(8) a priori co«t 

(9) Violation category cost 

(10) QT cost 

(11) QL cost 

(12) Total cost reported by MEX 

(13) Association cost 

(14) Total cost assigned to the potential recognition in MINT 

(15) Identification of optimal predecessor 

(16) Interword gap cost to optimal predecessor 

(17) Total cost from this node upward along optimal path. 

e. Costs for all interword gaps between potential predecessors. 

f. Identification of the ten lowest cost paths through the graph of the 
utterance, and for each: 

(1) whether correct or incorrect 

(2) total cost 

(3) vocabulary items 

(4) nodes. 

g. vocabulary items actually spoken. 

h. For the entire file of utterances, the number correctly and the nxim- 
' ber incorrectly recognized. 

STATSUM. This program provides the following data: 

a. An annotated listing of the entire MIND file. 

For each utterance processed by BIGMINT, the index of the utterance 
within the MNSET> identifier of the compressed speech data file> and what was 
acutally spoken. 
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c. For each of the ten best paths through the graph of the utterance i in 
increasing order of total path cost^ 



(1 
(2 
(3 
(4 
(5 
(6 
(7 
(8 
(9 
(10 
{11 
(12 
(13 
(14 



whether correct or incorrect 
total a priori cost 
total violation cost 
total QT cost 
total QL cost 

total of costs reported by MEX for all nodes of the path 
total association cost 

total of costs assigned by MINT to all nodes of the path 

initial delay cost 

total interword gap cost 

final delay cost 

total interword timing cost 

total of all costs for the path 

nodes of the path* 



d. Category and type of the recognition problem posed by this utterance 
(as defined in Section IV) • 

e. The difference in all costs listed in c. above, between all incorrect 
paths and the best correct path, or an indication that no best path exists. 

LICVAT. This program provides the infoxncnation listed below. Several of the 
quantities mentioned are defined in Section IV* 

a. An enumeration of utterances in Category 0, with identification of 
the compressed speech data file and the index of the utterance within the 
MNSET. 

b. The enumeration of utterances in Category 1, of type other than (0,1), 
(1,0) or (1,1). For each utterance the following data are given: 

(1) compressed speech data file identifier 

(2) utterance index within its MNSET 
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(3) wh«th«r it w«s corr«ctly or Incorrtctly rtcogniz^d 

(4) all costs enumerated in item c. for the program STATSUM, for the 

best path 

(5) utterance type. 

c. An eniimeration of utteremces In Categories 2 and 3, with compressed 
speech data file identifier and utterance index within its UNSET 

d. An enumeration of utterances in Categojry 1 in which the best incor- 
rect path is a direct start-to-end node connection (corresponding to no spo- 
ken word) , with compressed speech data file identifier and utterance index 
within its MNSET. 

e. A list of all utterances in Category 1 , ordered on the basis of the 
difference in costs of various kinds between the best incorrect and the best 
correct path. (The M value defined in Section IV) . For each utterance the 
following data are given: 

(1) compressed speech data file identifier 

(2) utterance index within its MNSET 

(3) cost difference (M value) 

(4) whether correctly or incorrectly identified 

These ordered lists are generated for each of the cost contributions mentioned 
in connection with program STATSUM, item c. 

f. A compute generated plot of the cumulative distribution of M values i 
for all utterances amd for incorrectly recognised utterzmces only. Histogram 
data are also given for M increments of 10, for all utterances and incorrectly 
recognized utterances only. 

g. All data described in e. and f. above, but restricted to utterances 
in Category 1 of the following types: 

(1) (0,1) 

(2) (1,0) 

(3) (1,1) 

h. For all real recognitions, counts of the number of associated recog- 
nitions of each vocabulary item and form, and the total number of associated 
recognitions, by vocabulary item cmd form. 
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i. As in h. I but for all artif actual recognitions, 

j. For all rsal recognitions, counts of occurrences of each violation 
category, and the total overall violation categories, by vocabulary item and 
form. Also, the total number of real recognitions (overall vocabulary item) 
of each violation category. 

k. As in i. above, but for all artif actual recognitions. 

1. For all real recognitions, a conputer-generated plot of the cumula- 
tive distribution of the QL linearizing function, f , with histogram data. 

n.«. As in 1. above, but for all jirtif actual recognitions. 

n. The following data eOxsut initial delays, categorized by real vice 
arti factual recognition and vocadsulary item and form: 

(1) total number of recognitions in the category 

(2) number of zero delay values 

(3) fraction of cases which were zero 

(4) the average of the non-zero initial delays 
o. As in n. above, but for final delays. 

p. A computer generated plot of the cumulative distribution of the inter- 
word gap normalizing function, f, for all interword gaps between contiguous 
real recognitions. 

q. As in p. above, but for all interword gaps between recognitions and 
their potential predecessors, other than contiguous real recognitions. 
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SSCTZOM IV 

ANALYSES * 

The analyses porfonned in the VIAS study are reported in this section. 
These analyses were major parts of Task Groups i and 5 and a minor part of 
Task Group 4. The analyses fall naturally into two categories. The first 
category (Task Group l) is concerned with transferring the connected speech 
recognition capability developed with the VIP- 100 speech preprocessor to its 
successor* the TTI-500. The second category (Task Groups 4 and 5) is con'- 
cerned with a critical examination of the LISTEN speech recognition algorithm, 
to determine its strengths and weaknesses, In hopes of discovering fruitful 
approaches to improving its performance and easing the task of applying it in 
automated training systems. 

Underlying each type of analysis were several steps, starting with deter- 
mination of the types of data needed, design of an algorithm for extracting 
the data, inplementation of a program or program segment for extracting the 
data (as part of the Performance Analysis Subsystem, PASS) and, finally, ex- 
tracting and analyzing the data thus obtained. In the description of the 
analyses presented below, only the nature of the data used, the data itself 
and the analysis of the data are discussed. Designing, in^jlementing and exer- 
cising the relevant portion of the PASS are not discussed, although those 
efforts consximed a significant portion of project resources. Instructions for 
using the PASS to develop data of the type presented in connection with the 
following analyses are given in Appendix B, the PASS Users Manual. 

The remainder of this section is divided into five parts addressing: 

a. Ihe experimental bases used in the analyses. 

b. The transfer of technology (Task Group l). 

c. The contribution of each information source to recognition (Task 5d) . 

d. The cmalysis of recognition errors (Tasks 4c and 5d) . 

e. The critical examination of information source models (Tasks 5a and 

5b). 

EXPERIMENTAL BASES FOR THE ANALYSES 

Voice data were collected, and MIND files were created, for two new speak- 
ers (LHN, JEP) during the course of this project. Voice data for testing were 
also collected for these two speakers, and LISTEN was exercised on these data. 
Finally, the performance analysis progreuns in the PASS were extercised on out- 
put obtained from LISTEN for speakers MWG, LGN and JEP. In this way the pro- 
grams of the VDGS and the PASS were validated, and data were generated for 
the analysis tasks of the project. 
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Speech data ware collected In several (five to ten) sessions over a few 
days« After collecting all of the data from each speaker i It was divided Into 
three equal parts called Training j interim Test and Test data. These terms 
were Inherited from the LISTEN development project wherein the second set of 
data was used to test mom Initial concepts^ in this project the Interim Test 
data were simply used to extract certain speech characteristic data not obtain^* 
able from Training data« 

Each set of data consisted of six **Magic Number Sets*'. Each of these is 
fifty-^five utterances of from one to four words* arranged in a format which 
makes the numbers appear to the speaker to be quite random. They are actually 
a carefully balanced set of vocabulary items* combined in such a way that each 
digit occurs an equal number of times > and the word '^point" nearly as often, and 
so that each transition between vocabulary items appears exactly once, 

.Each major data set (Training* Interim Test and Test) thus consisted of 
three hundred thirty utterances containing one thousand fifty words. Train*- 
ing data were used to generate structural characteristics of each vocabulary 
item (Transition Letter Sets and Loop Letter Sets)* and some statistical prop** 
erties of the voices were extracted from Interim Test data. Test data were 
used only for testing purposes. Most results in the following are therefore 
based on Test data. The exception is the investigation of statistical models* 
where data are compared for Interim Test data and Test data* to determine the 
validity of certain statistical assumptions. 

The voice data teUcen from the two new speakers were processed differently* 
with qualitatively different recognition performance results* as is described 
in the following paragraphs. New data for only one speaker (LKN) were there** 
fore usable in the detailed analyses of LISTEN performance. 

An experimental variable of considerble interest in connection with the 
VDGS was also investigated in this study. This variable* Example Set Genera- 
tion* relates to the way in which speech data are separated into sets of 
examples of individual vocabulary items. This step is necessary for the gen*- 
eration of Transition Letter Sets by the program GZEC. Two approaches have 
been used to segment the saunples of connected speech. Originally* computer 
printouts of speech preprocessor data were scanned by eye* and segments within 
each utterance which contained each individual vocabulary item were identified 
visually and recorded manually. These segments were selected to contain the 
vocadDulary item v^IwH high confidence* but with as little additional material 
as possible* in the judgement of the person marking the data. This remains 
the recommended way to produce the needed example sets. 

As described in Reference 1* an automatic method of generating sets 
of individual vocabulary saunples has also been developed. This pro- 
cedure, embodied in the program ESG (Example Space Generator) applies 
statistics derived from MWG*s voice to excise segments of a multiword 
utterance which contain individual vocabulary items with high confidence. 
This, of course, entails a tradeoff between taking a large segment 



NAVTRAEQUIPCEN 78-C-0141-1 

iAcludln9 extranttous material ^ and taking a smaller segment with attendant in*^ 
creased risk of excludinq some portion of the spoken word. Since the nature 
of GZEC makes it much more sensitive to the deletion of parts of words than to 
the inclusion of extraneous material ^ the safety fe'^tors used in ESG (to accom- 
modate statistical variability in articulation and still extract undipped 
vocabulary item examples) are quite high* Not surprisingly , the safety fac- 
/ tors used in ESG make the word length statistics derived from MWG's voice data 

apply to other speakers as well* Using ESG to generate vocabulary item samples 
is therefore an alternative to doing so manually when the VDGS is applied to a 
new speaker. This is the additional experimental variable which was investi** 
gated in t^is project, by using the manual procedure for LHN's voice ^ta. and 
the automatic procedure embodied in ESG for JEP's voice data. 

Although the transition letter sets generated by these methods do not appear 
qualitatively different (see Figures 1 and 2) , the recognition performance ob-* 
tained using ESG was significantly inferior to that obtained by using the man-- 
ual procedure » as the following data show: 
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Detailed examination of LISTEN'S performance for speaker JEP shows that 
the Transition Letter Sets are not effective; MEX very frequently does not 
detect a word actually spoken as potentially present in the utterance • This 
occurs relatively infrequently for MWG and LHN» Tlje difference in Transition 
Letter Sets is presumably due to the extraneous material in the set of examples 
from which the JEP Transition Letter Sets were derived* 

The failure of MEX to detect the potential occurrence of an actually spo- 
ken word also seriously perturbs the voice reference data extraction process, 
which contributes to the poor performance* (Notice that recognition results 
for JEP are better on Test data than on Interim Test data, even though statis- 
tical data are extracted from the former.) For these reasons, the data for 
JEP, while contributing a significant result to the project as a whole, were 
not used in the detailed examination of LISTEN as its performance with bad ref- 
erence data is not indicative of its true potential • 
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Figure 2. Transition Letter Set? (TLS) for Vocabulary Items "ZERO" 

tnrough "NINE" and "POINT" for Speaker JEP, Generated from 
Manually Produced Exan^les 
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7hm LISTEN CQnMPt#d speech recognition tystwi wAt <l4iv«lopttd using Thresh- 
old Technology Corporation *■ speech preproceeeor Model VZP^IOO, which ie no 
longer being pMuoed. Xte succeeeori the Model TTX<»500 ie based on a^iiisiilar 
principle of operation, provides output which is identical to th*t)^ of ^ pred- 
ecessor In terms of the electrical interface and digital format, io^Ati'M^ 
pected to be available for a considerable time into the future « It is tnerefore 
both feasible and desirable to bring LISTEN into accomodation with the newer 
version of the speech preprocessor « 

The principal difference between the older and newer preprocessor is the 
acoustical significance of some of the speech features recognized by the re- 
spective devices • Only eight feat\ires are coirwion to the two devices ♦ In both 
cases thirty-two featxires are determined to be either present or absent at>a 
nominal rate of 500 times per second i and this determination is encoded as t%io 
sixteen-bit binary vpords, tramsmitted to the central processor during detected 
periods of speech « As LISTEN was purposefully developed to discover and to 
recognise patterns in a stream of binary data, without recourse to the acoustic 
significance of the data, very little change in LISTEN is required to accom- 
modate the new preprocessor* As LISTEN uses only sixteen of the thirty-two 
bits, or features, received from the preprocessor every two milliseconds > the 
only requirements to adapt LISTED to the new preprocessor are to sel%ct which 
sixteen of the available thirty-two features to use, and to change the inter- 
face accordingly* The analysis required in support of the transfer of LISTEN 
technology to the new preprocessor thus reduces primarily to selecting the 
features to use and secondarily to verifying the selection* 

FEATURE SET SELECTION* The only constraint which must be met in selecting six- 
teen of the thirty- two features available is that the long pause feature, LP4, 
which indicates the end of an interval of vocalization must be among them* 
Any other fifteen features could be used in conjunction with LP4* The vocali- 
zatxon indicator, LP4, must be included in the selected features because it 
is used as the indicator for end of utterance processing in LISTEN* The prob- 
lem is thus reduced to selecting fifteen features among thirty-one available* 

Using a single feature several times, i*e*, forming a sixteen-bit com- 
puter word by selecting less than fifteen features (plus LP4) and setting 
several bits equal to a single feature indicator, has no utility* This is 
because LISTEN is :.^nsitive only to the information* content of each feature 
position, so that the same results would be obtained by using a smaller num- 
ber of features, each represented only once* Since adding different features 
to a pre-existing set of distinct features has the potential (at least) of 
Increasing the availad^le amount' of information about what was spoken, only 
sets of fifteen distinct features need be considered* 
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An **idear* method for selecting the set of features to be used %iouXd be 
to directly evaluate the recognition performance obtained with alternate sets 
of features* Any other method must be considered to be indirect i and must be 
based on some assiamptions aUx)ut how the recognition performance vould be af- 
fected by different feature characteristics. Direct evaluation of even a few • 
alternative feature sets is quite iir^racticalj however > as more than forty 
hours of computer processing time is the minimum required to evaluate recog- 
nition performance. Since there are over three hundred million subsets of 
fifteen items taken from a set of thirty-one items, some method of pre- 
selecting a (very much) smaller collection of alternatives must be used any^ 
way. Practical necessity therefore drives one to an indirect method of 
selection « 

One indirect method of feature selection is to refer to authority. In 
this case the unquestioned leading authorities on the acoustical significance 
of features are the personnel at the preprocessor mamufacturing facility, where 
the circuitry for extracting the available features was developed. The manu- 
facturer (threshold Technology, Inc.) most cooperatively suggested a set of 
fifteen features which, in their judgement, would work well in the LISTEN en^ 
vironment. Since LISTEN is a coirqDlex algorithm which had not been thoroughly 
tested at the time, the manufacturer's suggested set of features must be re- 
garded as an informed opinion rather than a definitive solution to the problem. 
This opinion is based on extensive testing of many different features. (Pre- 
sumaUDly in the context of isolated word/phrase recognition, which, while dlf- 
fereuv in many practical respects from connected speech recognition, should 
nevertheless exhibit similar sensitivity to the utility of a feature for rec- 
ognition.) The set of features suggested by the manufacturer is the set 
ultimately selected for use in LISTEN, for reasons described in the following 
discussion. 

An atten^t was made to measure objectively the utility of each feature 
for recognition. The approach used was to posit several different measures 
of feature •♦quality**, obtain values for these measures and analyze the re- 
sults* The measures posited were based on plausible judgements about observ- 
able characteristics of a feature which carries a large amotant of information 
which would be useful for distinguishing among vocabulary items. 

This approach to evaluating features suffers several shortcomings, in 
spite of its intuitive appeal. Most serious of these shortcomings, perhaps, 
is the question2Lble nature of the assumption that features can be evaluated 
individually. The recognition procedure used in LISTEN is based upon detecting 
the simultaneous presence, or absence, of several features in the preprocessor 
output. It is therefore possible that there is no measure of effectiveness of 
individual features, and only the effectiveness of sets of features can be given 
concrete meaning. The actual situation is probably intermediate between the 
inherent extremes. That is, there probably are indicators of individual fea- 
ture utility such that selecting those fifteen features with highest utility 
produces an excellent, but not necessarily the best possible, choice. 
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Another difficulty with evaluating quality measures of features is a 
practical one. Data tnust be collected from a particular speaker or speakers > 
speaking phrases from a particular vocabulary, SP|4»in9 the difficult question 
as to how valid the results might be for other speakers and other vocabularies. 
In the VZAS project the available resources allowed examining data collected 
from a single speaker (LHM) , speaking only the digits. (The primary limita- 
tion here was labor required to do the analysis in a timely manner, as voice 
data were available from several other speakers.) 

A third difficulty with basing feature selection on evaluation of some 
intuitively appealing measures of feature quality is that the quality measures 
themselves are entirely ad hoc , as it is not practical to test the quality 
measures^for the same reasons that it is not practical to evaluate alternative 
selections of features. 

On the positive side, there is a possibility that meaningful individual 
feature quality measures can be posited and their evaluation may give some 
clear indication of the utility of at least some features. The quality meas- 
ure approach was followed in the hope that this would be the case. 

Quality Measures . Six measures of individual feature quality were posited and 
evaluted. One of these (VFO) is defined in terms of the frequency of occur- 
rence of a feature in various vocabulary items. The other five are attempts 
to meastire the amount of relizQjle "structure" — reliably occurring sequences 
of feature-present/feature-absent zones - which exist in a large sample of 
vocalization of a given vocabulaur/ item. 

The quality measures were evaluted on a data set extracted by the TTI-500 
while the subject (LHN) spoke various vocabulary items in connected combina- 
tions. Individual vocabulary items were visually identified within computer 
printouts of the features detected by the preprocessor. Distributing the seg- 
mented connected speech data into example sets for each vocabulairy item pro- 
vided the data base needed for evaluating the quality measvires. 

In order to evaluate quality measures other than the first-, some way to 
recognize and extract reliably occurring patterns of a feature *s history within 
each voceUoulary item was needed. The program GZEC, incorporating the algorithm 
GENRLIZ, was used for this purpose. (This program and algorithm are part of, 
and described in correction with, the VDGS, and in previous LCSR project re- 
ports.) GZEC was utilized two times for each vocabulary item (in this case 
just the digits 0-9), first operating on data containing the manufacturer's 
recommended set of fifteen features (hereafter called the- Initial Feature 
Set), and second operating on data containing only the other sixteen features. 
GZEC extracted Transition Letter Sets from sixty-six examples of each vocabu- 
lary item. These Transition Letter Sets exhibit the pattern with which each 
feature occurs reliably in the saunple of sixty-six vocaliza-cions of each 
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vocabulary itam* The faatore quality measures (other than the first) are there^ 
fore defined in terms of the patterns the feature folXoi^s^ as revealed in the 

Transition Letter Sets for each vocabulary item. 

Since the Transition Letter Sets obtained for a collection of utterances 
is dependent upon interaction between features # it cannot be assumed that this 
method of processing treats all featxires identically ♦ Attempts to eliminate 
this potential bias are frustrated by the fact that GZEC can process at most 
sixteen features in a single run, and there are htindreds of millions of ways 
to select sixteen features from the available thirty-one. 

Definition of Feature Quality Measures* The individual feature quality meas*- 
ures used in this investigation are: 

a» var-».ance of Frequency of Occtirrence (VFO) ♦ If a feature occurs very 
frequently in some vocaUaulary items, about half the time in others, and very 
infrequently in still others i that feature would do useful for distinguishing 
among some vocabulary ittuns. The quantity VFO measures the vocabulary item 
dependent varieU^ility of frequency of occurrence of a feature. It is the 
variemce, across vocabulary items, of the average frequency of occurrence of 
the feature ir i^ach vocabulary item. It is determined by the eqxxation: 



is the average frequency of occiurrence of the features in sait^Xes 
of vocabulary item V 

I. is the average of yiv over all vocabulary items v 

b. Frequency of Zero and One (F01)» Each feature position in the Tran- 
sition Letter Sets indicates the reliably occurring pattern of development of 
that feature in the word. Each Transition Letter Set indicates that, at its 
corresponding point in the word, the feature is either reliably present (indi- 
cated by 1), reliably absent (indicated by 0), or not reliably either present 
or absent (indicated by a blank) . If a feature has a rich and reliable pat- 
tern of occurrence and/or non-occurrence in a word, then the number of O's 
and l*s for that feature is large compared to the niomber of blanks. The aver- 
age fraction of Transition Letter Set occurrences which are zero or one is 




where 



veV 

jvj is the number of vocabulary items 




vf.v \ i=l 
IS the number of vocabulary items 



where 



Ti 



IS the number of Transition Letter Sett, found by GZEC for vocab-- 
ulary em \ , and 



1 if the feature m question has value k (0 or 1) m Tran 
sition Letter Set T, and 0 otherwise. 
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c. Vocabulary Varianea of Praquaney (WF) . Whlla a faatura with loif fra- 
criancy of raqpilred prasance or absanca (low F01) nuat hava marginal utility for 
racognition, variability ovar vocabulary itama of tha fraquancy of raquirad 
praaanca or abaanca might indicata high vocabulary itam dapendance of tha faa- 
tura. Thus WF is dafinad to ba tha variamca ovar vocabulary itama of the fra- 
quancy with which tha faatura is raquirad to ba praaant or abaant. That ia, 
WF is tha variance of F01 determined for each vocabulary item. 



^ ' TvT S (FOI^ - PCI) 2 



v 



where FOl^ - J-J^ [#o<Ti,v> *1 (Ti,v)] 



i»l 



and other quantities are defied in (b) above. 



d. Average Nxjunber of Zero or One Zones (ANZ) . Each feature tends to 
vary rather regularly within a word, exhibiting zones where the feature is 
reliably present or absent, bordered by zones where its occurrence is unpre- 
dict2d3le. The number of zones wherein the feature is either reliably present 
or absent is an indication of structure as found for the vocabulary item by 



GZEC. 



1 V 



ANZ - ^ 

' ' vev 

where Iv] is the number of vocabulary items, and 

Zv is the number of zones of required presence or absence of the fea- 
ture, as exhibited in the Transition Letter Sets for vocabulary item 
V. 



e . Average Nximber of Zero/One Zone Reversals (ANR) . As an indicator of 
the richness or complexity of the reliably occurring pattern of a feature 
within a word, one can count the number of reversals between zones of required 
absence and required presence of the feattire, ignoring any intervening zones 
where the feature is not reliably present or absent. The result is 

V€V 

where \v\ is the number of vocabulary items, and 

is the number of reversals between zones of required absence and 
required presence of the feature or vice versa, in the Transition 
Letter Sets for vocabulary item V, 
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f « M«an Log Probability of Acceptance (MLP) . A feature which is almost 
always absent but does reliably <»ceur at some point within a vocabulary item 

(or vice-versa) is an effective rejection device for eliminating false recog- 
nitions. A measure sensitive to this situation can be obtained by finding p> 
the frequency with which a feature is present over all vocabulary items, and 
conqputing the probability with which a random, uncorrelated sequence of zeros 
and ones, wherein the ones occur with frequency p, would be accepted by the 
Transition Letter Sets for that word. MLP is the negative natural logarithm 
of that probability, and can be cooqputed from 

"T^T 2 tto,v io9(i-Pvi + #i,v Pvl 

' ' veV 

where \v\ is the number of vocabulary items 

p is the average frequency with whicr the feature occurs in all 
vocabulary items 

Evaluation and Analysis of Feature Quality Measures . Figure 3 shows estimates 
obtained for the six quality measures described above. Each quality measure 
is a non-negative number, with higher values suggesting greater utility of 
the feature for spe-^ch recognition purposes. 

As described earlier, each of these measures is an ad hoc construction 
based on an intuitive concept of what characteristics a feature might indicate 
utility for recocmition. If some of the measures evaluated are in fact reli- 
able and accurate indicators of utility for recognition, then it would be ex- 
pected that significant correlation would appear among those measures. 
Unfortunately, perusal of the data in Figure 3 shows poor correlation between 
all pairs of quality measures. This observation is borne out by the data in 
Figure 4, which shows the coefficient of correlation and coefficient of deter- 
mination (the square of the coefficient of correlation) between all pairs of 
quality measures. Reliable pairs of indicators would exhibit a large positive 
coefficient of correlation and coefficient of determination (both near +1). 
Since none do, there is at most one reliable and accurate quality indicator 
among those used. 

The lacJc of consistency among all pairs of suggested quality measvires is 
quite remarkable, in view of the rational and intuitively appealing basis for 
each of the individual measures. It appears that no two of the measures are 
reliable and accurate indicators of featxure utility. It remains possible, 
however, that a consensus (if one exists) of the measures may be indicative of 
feature utility. This possibility was investigated as described in the fol- 
lowing paragraphs. 

Each of the tentative quality measures establishes an order of preference 
(mathematically speaking, a partial order) on the set of features. This pref- 
erence structure is shown in Figure 5. In that figure, a feature in a main 
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♦Feature same for VIP-100 and TI-500 (LP4 is not shown). 



Figure 3. Estimates of Individual Feature Quality Measures 
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Figure 4. Coefficients of Correlation and Determination 
Between Pairs of Feature Quality Measures 
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Fi-^ure 5. Preference Structure Induced on the Set of Features 
by the Six Quality Measures 
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vnrtical coluiwi it preferred over any other feature below it in the main col- 
umn. Features offset to the right are all preferred equally to the feature 
iwmediately above in the main column. Features in the Initial Feature set are 
marked with an asterisk. ^ 

A useful concept for dealing with incompatible orders or preference struc-* 
tures is Pareto optimality. In this application, a set of fifteen features is 
Pereto optimal if there is no other set of fifteen features which is preferable 
under each of the six preference structures. (A set S is preferable to a set 
S' if and only if each member of S is not less preferable than any member of 

and some member of S is definitely preferable to some member of S'.) 
Starting with any set of features, bne may derive from it a Pereto optimal set 
by examining its elements one-byone to determine if any feature not in the 
set is uniformly at least as preferred under each quality measure, and def- 
initely preferred under at least one quality measure. If so, that element is 
replaced by the preferred one, and the process is repeated until no further 
change takes place. 

When this process is applied to the Initial Feature Set, it is found to 
be very nearly Pareto optimal. For three members of this set of features there 
is one uniformly preferred feature in the Complementary Set: B15 is the only 
feature uniformly preferable to A7 and similarly preferable to A8; A5 is the 
only feature preferable to A9; three features, A1 , A5 and A15 are all uniformly 
preferable to A14. The Initial Feature Set can therefore be made Pareto 
optimal by replacing A7 or A8 with B15, A9 with A5 and A14 with A1 , A5 or A15. 

An interesting consistency among these exchanges appears when the acous- 
tical meaning of the features is considered. Each of the features to be re- 
placed (A7 or A8, A9 and A14) is an indicator of high energy at some portion 
of the spectrum, and the replacing features are mostly {B15, A5, and A1 , but 
not A15) more complex indicators either of specific phonemes or more general 
spectral characteristics, such as a positive energy slope over a range of fre- 
quencies. It is tempting to infer that Pareto optimization, which in some 
sense represents a consensus of the quality measures, reveals a preference for 
the more complex features over the more basic spectral energy concentration 
indicators. Some confidence in this interpretation, ard the indications of 
the Pareto optimization results in general, might be justified if the Comple- 
mentary Feature Set were found to be far from Pareto optimal. Unfortunately, 
this is not the case. Carrying out the optimization process for the Comple- 
mentary Feature Set requires only that HI 2 be replaced by Hi and B11 be re- 
placed by A9, B1 or B10» The Complementary Feature Set is thus even more 
nearly Pareto optimal than is the Initial Feature Set. This fact indicates 
that the six putative quality measures are very incompatible and that any sub- 
set of fifteen features is probably almost Pareto optimal. 

The dismal failure of the quality measures to give clear indications of 
differences among feature and, in fact, to demonstrate anything at all, is an 
indictment of any intuitive approach to evaluating feature utility. Apparently 
a satisfactory evaluation of feature utility will have to await a more pene- 
trating analysis. 
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« Xn th« abMno* of «ny Mtitfaotory lAdloation of rolativt individual f««* 
turo utility for recognition, th« initial F«atitr« S«t wat ratainad for uta in 
tha ranaindar of tha VIAS project. Ona virtue of this selection is that the 
apaech data gathered uaing this set of featxures, and results obtained with 
them, extend the data set and results learned in other related projects (such 
as the Laboratory Version AIC Training System) , which use tJie Initial Feature 
Set. 

FEATURE SET VALIDATION (Task Id) . Selection of the Initial Feature Set for use 
in the subsequent analyses in this project was validated by monitoring the per- 
formance of the entire LISTEN speech processing system operating with that 
selected set of features. The process of extracting speech characteristics 
for two speakers (LHN and JEP) was monitored especially carefully to detect 
any indication of individual feature peculiarity. Transition Letter Sets were 
extracted as usual from ninety-six examples of the eleven word LCSR vocabulary, 
using the GZEC program. Each feature clearly contributes to the recogniaabil- 
ity of at least some vocabulary items, and most features display regularity in 
most vocabulary items for both speakers. The Transition Letter Sets obtained 
by GZEC are shown in Figures 1 and 2. Loop Letter Sets generated for the same 
speech data also failed to reveal any anomalous characteristic of any individ- 
ual feature. The Loop Letter Sets indicated that the Transition Letter Sets 
almost completely characterize the TTI-500 output for each vocabulary item, as 
they did for the VIP-100 output. That is. Loop Letter Set states are quite 
infrequently entered, most words being recognized through a sequence of tran- 
sitions from one Transition Letter Set state to the next. 

The remainder of the voice data analysis process leading to the data base 
needed for real-time recognition is not easily related to individual feature 
characteristics. These processes include collecting data about the timing of 
transition and loop sounds (states) , violation and artifact (false alarm) 
rates, etc. However, these were monitored and no peculiarities attributable 
to, or suggestive of, individual feature anomalies were detected. 

Although no specific anomalies were noted in the process of extracting 
voice reference data *rom speech samples for these two speakers, the recog- 
nition accuracy which LISTEN exhibited for them was significantly inferior 
to that obtained for MWG using the VIP- 100. As described in connection with 
Example Set generation, the poor performance for JEP can be attributed to the 
method of generating individual vocabulary items, but MWG's and LHN»s voice 
data were processed in functionally identical ways. It remains ambiguous, 
therefore, whethe* the difference in recognition performance between these two 
speakers is due to speaker peculiarities or speech preprocessor differences, 
and if the latter, whether a different selection of features might lead to 
better recognition accuracy. Unfortunately, project resources did not permit 
resolution of this ambiguity. 

SUMMARY OF TECHNOLOGY TRANSFER TASK RESULTS. Practical considerations forced 
an indirect approach to choosing a set of features for use with LISTEN from 
those available from the newer roodel speech preprocessor. An Initial Features 
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Smt vas constructed following thm prmprocmmMor manufacturer *m raconmandations* 

Six maasures of individual feature utility were posited, evaluated and the 
results analyzed. The six measures were found to be pairwise incompatible to 
a high degree. The Initial Feature Set was adopted for use in the remainder 
of the study in the absence of any rationale for selecting another set, be- 
cause this extends the accximulated data and experience based on that set of 
features • 

In a qualitative sense, the previously , developed LISTEN technology was 
successfully transferred to the new preprocessor in all phases of the LISTEN 
operatio^. including voice data collection, voice data analysis and reference 
data generation, and real-time voice recognition, in the sense that no quali- 
tative change to LISTEN was required to achieve recognition • However, inferior 
recognition performance for LHN, whose voice data were processed in essentially 
the same way as MWG*s, leaves it unclear as to whether LISTEN can obtain simi- 
lar performance with the two preprocessors. On a word basis (counting all in- 
sertions, deletions and sxibstitutions as errors) 95% recognition was obtained 
for MWG and 89% for LHN, using Test data, without speaker feedback. This mar- 
ginal difference in performance could presumably be due to either speaker or 
preprocessor differences. 
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CONTRXBUTION OP EACH INPORMATION SOURCE 'tO RECOGNITION 

«.4 ''^ 2l?*'f^^*'* R«^«r#nc« 1, thi LISTEN upm%ch recognition syttem has two 
major subdivisions, implsmsntsd in procprams MEX and mint! mex dstscts t^s 

^^"^^ prsproesss^r output i*hich exhibit the structural 
characteristics of individual vocabuxary items, and notified MINT of thsse 

^rJ2S*Lrjf.!?'^r^''f**""; i« P>^oces«, notes the presence of 

til^ urZ .t peculiarities (if any) of each potential recogni- 

tion. MINT then processes these data to distinguish bet«reen real recognitions 

wndl ^^''^^ infonnation of vatious o^er 

kinds. Each of these information sources is discussed separately in the follow- 
in^ punier Aphs* 

CONTRIBUTION OP STRUCTURAL INFORMATION IN MEX. Structural data are used in 
two ways in the LISTEN recognition procedure, as the description of MEX and MINT 
just given shows. The expected structure of individual vocabulary items is 
ftnL"""'! %f ^'•''r'' P«>tential presence of that item in the incoming 
stream. The first use of structural information is thus an initial detection 
!?«n- ^ contribution are the freouency with which vocaliza- 

tions of words are not detected, and the frequency with which artifacts are 
generated. These data are presented in Figure 6 for Test data. 

Number of Number of Vocabulary Niiaber of 

Vocabulary Items not Detected Artl factual 
Speaker/Preprocessor Items Spoken by MEX/Percentage Recognitions 

MWGAlP-100 1049 7/0.7% 2512 

LHN/TTI-500 1054 21/2.0% 1269 

Figure 6. Missed and Artifactual Recognitions in MEX Output 



The riiffer-jnce in MEX failure and artifact production rates, between the 
two speaker/preprocessor combinations, is quite remarkable. The artifact 
production rate for LHN is half that for MWG, at the cost of three times the MEX 
rejection rate. The detection of the potential presence of a word in the speech 
signal is primarily dependent upon the combined discrimination capabilities of 
the preprocessor fe^cures and the Transition Letter Sets. Visual and guantita- 
tive comparison of the Transition Letter Sets for these two speakers fails to 
reveal any substantive difference. (For example, both speakers average 9.5 
Transition Letter sets per vocabulary item.) if there were significant differ- 
ences in the ariculatory habits of the two speakers, for example in enunciation 
precision, presumably the difference would be evident as structural differences 
in their respective Transition Letter Sets. As there are no apparent differ- 
ences, it seems likely that the contrasting MEX failure and artifact production 
rates are characteristic of the preprocessors, or at least of the sets of fea- 
tures LISTEN accepts from the two preprocessors, and not due to differences 
between the two speakers. 
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CONTRIBUTION OP OTHER INPORMATIOM SOURCES IN MINT* In thm pt^CM of 
detecting^ by use of structural information^ the potential occurrence of a voca- 
bulary item in the speech <lata stream^ MEX also computes two measures of how 
typical the time duration of various detected feature combinations are. MINT 
thus receives from MEX notification of the occurrence of a potential recognition 
of a particular type^ start and end times of the potential recognition^ an indi- 
cation of any detected structural peculiarities^ and two indicators of temporal 
peculiarity. Using the start and end times of the potential recognition, MINT 
(in principle, at least) builds and operates upon a directed graph representing 
the utterance. * This directed graph consists of a Start and an End node^ togeth- 
er with one additional node for each potential recognition* A pair of nodes is 
joined by a -directed edge if and only if the start and end times of the events 
are compatible with one node representing the event immediately preceding the 
event represented by the other node. MINT then computes the path through this 
directed graph, moving backwards from End to Start, seeking the best explanation 
of what has been observed about the utterance* In the process of doing this 
computation, MINT adds to the structural violation and intraword timing data 
supplied by MEX, data about the a priori probability that a potential recogni- 
tion is real versus artifact, about its coincidence in time with other potential 
recognitions, and about the interword timing. All of these data are expressed 
numerically as a scaling constant (-64) times the natural . logarithm of the 
likelihood ratio for the occurrence of what was actually observed. That is, the 
i^'^ information source is summarized as a value 

AO s -64 Zn ^^^^ (observation/real) 
i Prob (observation/artifact) 

Those Information sources relating to individual potential recognitions 
produce AQ values associated with nodes, and the interword timina data produce 
AQ values associated with edges of the directed graph. 

The AQ values, attached by MINT to nodes and edges of the graph of poten- 
tial recognitions, are estimates of the scaled log likelihood ratios based on 
statistical models of each information source. The parameters of these statis- 
tical models are estimated from speech data during the voice data generation 
process. Validity of these statistical models and estimation procedures is 
examined in Tasks 5a and 5b, described later in this section. In this task, 
attention is directed to determining how effectively each information source, as 
represented by its associated AQ values, contributes to the recognition 
procedure. 

As shown in Reference 1, under suitable assumptions, the Bayes optimal 
solution to the problem of deciding which path through the graph is best reduces 
to the problem of finding the path with minimum sum of AQ values on nodes and 
edges. Evaluating an information source's contribution to recognition thus 
reduces to determining how effectively the AQ values help establish the correct 
path through the graph as the one with minimum total cost. Although MINT 
considers all possible pathP through the graph of the utterance, correct 
identification of the spoken words depends decisively on the AQ values attached 
to two particular paths through the graph (when they both exist): the lowest 
cost path of those which gives the correct answer, and the lowest cost path of 
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thoa* which qiv% «ny Ineomot tntmr. If thm— two paths mni»t» the cor»ct 
an«w«r is found prscissly when the total cost of the tormr is less than the 
total cost of the latter. The effectiveness of the ith information source, 
in establishing a correct path as the chosen one» is thus indicated by the 
difference between the suns of the AQ values along the best of the incorrect 
paths and the best of the -correct paths, subtracting the latter from the 
former gives a value which, when positive, indicates that the information 
source in question is a productive contributor to selecting the correct path 
but which, when negative. Indicates that the information source is counter- 
productive. 

The first measure used to evaluate the contrH»uti,on of the i*^ information 
source is, for the reasons just given, defined to be 



best best 
incorrect correct 
path path 

The measure of information source contribution just defined cannot be applied 
if the graph of the utterance does not contain at least one path giving a 
correct result and at least one path giving an incorrect result. Although 
there are several different possible reasons for such a situation arising, 
only one hae been observed to arise conmonly in practice, lhat is the failure 
of MEX to detect a word actually spoken and inform MINT of its existence as a 
potential recognition. Failures of this type occur only vrtien the word as 
spoken does not have expected structural characteristics; i.e., when the word 
exhibits extensive structural violation. These cases are much in the minority. 

The measure M gives a value to the contribution of each information source 
towards correct recognition in each utterance. Tto summarize the utility of 
the information source ver many utterances requires some approach to dealing 
with the collection of M values for each utterance. One approadh, adopted here, 
IS to present a graph of the cumulative distribution of observed M values. 

The PASS program STATSUM computes the best correct and best incorrect 
path through the graph of each utterance, and also the contribution of each 
information source the cost difference between these two paths, i.e., the 
M value defined above. 

Figures 7 and 8 show the M distributions for each information source, for 
MWG and LHN, respectively. From these graphs one can obtain at a glance such 
indicators as the fraction of cases wherein the information source was counter- 
productive (i.e., the fraction of cases where M is negative) and such qualita- 
tive features as evidence of peculiar clusters of cases. 

Since M can be interpreted as an estimate computed in MINT the logarithm 
of the likelihood ratio for the correct path being in fact correct, M values 
can be translated into odds that the correct path is in fact correct. For 
exanple, an M value of 147.4 corresponds to an estimate in MINT that, accord- 
ing to that particular information source, the odds are 10-to-1 that the 



}7 



NAVTRAEQUIPCEN 78-C-0141-1 

V corr#ctpath» rathar than tha beat incorrect one> ia in fact correct* Odda 

valuea are indicated in Figures 7 and 8. 

Another interesting indication of the relative value of an information 
source is the frequency with which it is the most "productive" of all the 
aourcsa considered, in the aense of differentiating moat strongly (and cor-* 
re9tly) between the best correct emd best incorrect explanations of an utter- 
ance as indicated by a roost positive M value. The complementary notion is the 
infrequency with which the information source is not the most counterproductive 
one (i.e.i does not have the most negative M valtie) . An information source 
which is essentially random and which taUces on la^ge values would frequently 
be the most productive and also frequently be the most counterproductive! as 
these terms are defined above. Uierefore* both these indications of informa- 
tion source quality should be considered simultaneously* 

Using the data produced by STATSUM, it is possible to compute the fraction 
of cases in which each particular information source is the most productive 
and the fraction of cases in which it is not the most counterproductive. 
Figure 9 shows these figtores for M(<?G's and LHM*s test data, in the form of 
two-dimensional plots. Ihe same data are given in tabular form in Figxure 10. 
As can clearly be seen in Figure 9, there is a definite* consistency in the 
productivity of each information source for both speakers i with the single 
exception of the association information source. If one uses as a measvure of 
the utility of an information source the sum of the frequencies with which it 
is most productive and not least productive > the following ranking (best to 
worst) of information soxirces holds for both speakers: 

a. Interword Timing 

b. Violation Category 

* 

^ c. Intraword Timing (QT) 

d. A priori and Intraword Timing (QL) 

with Association somewhere below Violation Category. 

Another interesting single valued measure of the contribution of each 
information source can be obtained by confuting the information contained in 
the distribution of M regarding the selection of the correct path. To apply 
the theory of information to this situation^ one can model it as follows. Let 
the two paths contending for choice (the best of the correct and the best of 
the incorrect) be labeled A and B in the order they are discovered in MINT. As 
this is entirely random labeling, the correct path has equal probability of 
being path A or path B. MINT computes the total AQ along the two paths and 
chooses the path with minimum value. The difference in path AQ values (say, 
path A minus path B) will then be distributed as a random variable equal to S 
tiTOs M» where S is a random variable with probability one half of being either 
4-1 or -1, and M is the difference in AQ values for the incorrect path minus the 
correct path. The product SM is the value available in MINT, which may be 
regarded as a signal received. The message sent is equivalent to designation 
of which path (A or B) is the correct one, or equivalently , whether the value 
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I 

Of 8 it -H or -1. Th« information content of the siqnal about the message sent 
according to Information Theory, is tbe entropy of the random variable SM minus 
the entropy of the conditional random variable SM given S. This ie a value 
lying between sero and one bit of information (one bit being exactly enough 
information to decide perfectly between the two alternatives). If the M value 
were always positive, for instance, one could always recojnise the correct path 
as the one with least AQ value. The information content in that case is found 
to be one. If M is distributed symmetrically about zero, its information 
content it sero. 



The information content of an information source, defined above, has several 
interesting properties. One of them is that it establishes an upper bound on 
how successfully an information source can be used to select the right path, 
i.e., to recognize what was spoken, regardless of the algorithm used to effect 
the recognition. 

The information content of each information source has been estimated from 
the observed distribution of M values. The results are given in Piemre 10. 
These data tend to corroborate the ranking given to each of the information 
sources earlier* 



The fraction of time that an information source gives a correct, i.e., pro- 
ductive, indication of the right path can be read from the cumulative distribu- 
tion of M values. A positive M value indicates a correct indication, and a 
negative M value an incorrect one. These data are also summarized in Figure 10 
for each information soui ?e, and lend further evidence that the information 
source ranking given earlier is correct. (Since there are only eight violation 
categories, and violations are relatively rare, it often happens that the two 
paths do not have potential recognitions which differ in violation category. 
The M value in that case is zero, and th^ information source is equivocal. The 
frequency with which this occurs is also given in Figure 10.) 

ANALYSIS OF RECOGNITION ERRORS 



Two aspects of recognition error analysis covered here are the automatic 
claslfication of errors by programs in the PASS and relating recognition errors 
to Information sources. The related analyses are discussed below. 

AUTOMATIC CLASSIFICATTHM OF RECOGNITION ERRORS. In connected speech, many 
possible explanations for the observed speech data are usually generated in an 
attempt to recognize what was actually spoken. When a wrong explanation is 
selected, the caua** may be related to a large number of factors. This is 
especially true in .an algorithm like MINT which considers the entire complex of 
potential recognitions and all plausible explanations for the entire utterance. 
Classification of errors might at first seem like a hopeless task, as the 
nrocess can apparently go wrong in very many ways. However, when the 
recognition system works even moderately well, most errors are found to belong 
to a small collection of types. Simple deletions, insertions and one-for-one 
substitutions, for example, comprise the majority of all erors. So 
classification, and its automation, ig not a hopeless goal. 
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A uii#£uX dichotomy of recognition failures di«tinguiehes betiiieen those 
cases where there ck>ea not exist a path through the graph of the utterance 
which yields the corr ect Bt XrLna of vocabulary items, and those cases where 
there does exist sugM^a path. The foznner will be called "structural failures 

/ 

Structural failures are generally of two types* One, treated earlier, 
results from MEX*s fax lure to detect the potential presence of a word actually 
spoken. Tlie other type occurs when the correct potential recognitions are 
present in the graph of the utterance, but MINT fails to consider a path 
through them. This cam occur only when the interword timing is so anomalous 
as to exceed limits (set in MINT) on the time between potential recognitions 
to be considered potential predecessors. 

The PASS program BIGMINT recognizes structural failures and provides 
data hereby the type of failure involved may easily be determined. 

Mis recognitions involving a source of error other than structural failure 
occur because some incorrect path through the directed graph of the potential 
recognition has lower total cost than any correct path. By considering only 
the best of the correct and the best of the incorrect paths, the locus of the 
difficulty becomes apparent because even in utterances of several words, the 
best of the correct and the best of the incorrect paths usually have much in 
common, the difference existing only at a small portion of the utterance. 

As an example of the simplification obtained by considering only the best 
of the correct and best of the incorrect paths, consider the following. Ihe 
phrase "015." occurs in Test data for MWG. This utterance was misrecognized, 
as there were five paths through the graph of the utterance with lower cost 
than that of the correct path. These five paths corresponded to 201557, 20155, 
015.7, 01557 and 0155, the last one having least cost. Examining the correct 
path (015.) and the best of the incorrect paths (0155) on a node**by-node basis 
shows that they entail the same first three nodes (not obvious from the 
vocabulary items) , differing only in the last node* The node-by-node analysis 
shows that this is a case of simple sxabstitution, and the four other incorrect 
paths with costs less than the correct path are not informative, being present 
only becavuse of the anomalous properties of the final and the end of the 
utterance - anomalies already indicated by the comparison of best correct and 
best incorrect path. This simplification is typical to the point of being 
universal. o 

Con^arison of the best correct and best incorrect path becomes impossible 
if there is no correct path or no incorrect path. But whan at least one 
correct and one incorrect path through the utterance exist, the utterance can 
be further categorized as entailing either a single or multiple branches at 
which the two paths differ. The concept is illustrated in Figure 11. 

When the difference between the best parJis is the insertion or deletion 
of contiguous words, the path difference is interpreted as a single branch 
case , as in Figure 11(b). ^ 

The distinction between single branch and multiple br£mch categories is 
useful because the single branch group is amenable to further subdivision and 
because the multiple branch case is so rare. (It has not been observed to occur. 
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Single B»»anch Cases Multiple Branch Cases 



Figrure 11- illustrating Single and Multiple Branch Differences Between 
the Best Correct (Solid) and Best Incorrect (Dotted) Paths 
Ihrough the Graph of an Utterance 



Classifying utterances on the basis of the best correct and best incorrect 
paths thus gives rise to four categories of cases: 

Category 0 No best correct path exists (structural failure) 

Category 1 A single branch distinguishes the best correct and 
best incorrect paths 

Category 2 Multiple branches distingvdsh best correct and best 
inc rrect paths 

Category 3 No best incorrect path exists 

Among the Categoxry 1 cases one may fiirther distinguish cases on the basis 
of the nxamber of nodes in each part of the differentiating branch* Writing 
the number of node^ vwords) in the correct branch on the right and the nodes 
in the incorrect branch on the lefti a type (0,1) utterance is one in which the 
best incorrect path is formed by deleting one word in the correct utterance* 
If the best incorrect path has> in faci> lower associated cost than the best 
correct path, a sinple deletion occturs* Similarlyi a type (1,1) utterance is 
one in v^ich the error, or potential error, is a sinple substitution. A type 
M,0) utterance is potentially a simple insertion, and a type (2,3) utterance 
would potentially be a more complex type of s\abstitution . 

Itte PASS Program STATSUM classifies all utterances (both those correctly 
and those incorrectly recognised) according to Category amd Type as defined 
above, amd the program SSPLOT prints various data about each classified group. 
These data can be used to classify any set of utterances, including the subset 
of utterances with errors, as SSPLOT indicates which of the classified 
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tttttrancM mf aisnco^ilMd. ClMsifleation of all utt«r«nMft in thla uni- 
form way l« usaftil in that It shows both how typical various typss of eontsn* 
tion arias in LZSTSN* and the relative success MINT has in resolving each type 
of contention* 

Ihe results of classifying test data for MWG and LHN in this way are shown 
in Figure 12. 

Btie simpler forms of contention are found to be most cownon in LISTEN. 
MBX's failvure to spot the potential presence of a word* and sixrple insertioni 
delation or si^titution of a siaqple word cover the vast majority" of cases 
exanined. 

Correct recognition most frequently entails the resolution of which of 
two alternative words to choose; i.e., resolution of a sinple one-for-one 
substitution problem. Fu^Jthermore, this is a most difficxilt problem to resolve, 
as indicated by the relatively small fraction of these cases resolved correctly. 
One probable reason for the difficulty in resolving substitution of like num- 
bers of wrds (type (1,1) and (2,2) controversies) is that the strongest infor- 
mation sovirce, interword timing, is relatively ineffectual in these cases. 

MISRECX)GNITIONS VIS-A-VIS INFORMATION SOURCES. Hie data produced by STATSOM, 
coeiparing the best incorrect and best correct paths, permits detailed examina- 
tion of the course of each mlsrecognition . Quite often one peculiarity of a 
troublesome word stands out in a misrecognized utterance, but the specific 
nature of the peculiarity varies from utterance to utterance. While it is 
easy to identify the most counterproductive information source in individvial 
utterances, it is not reasonable to svimmarize these results for mlsrecognition 
casas only. An information source may correlate highly with both correct and 
incorrect recognitions, if it has a random component large enouc^ to dominate 
all other information soxirces. To maintain a balanced view of an information 
source, then, it is inportant to consider its influence on correct recognitions 
as %Mll as on misrecognitions . This was done in the analysis of the contribu- 
tion of each information source, susinarized in the earlier Figures 6 through 10. 

Another indication of the association of errors with information sources 
can be obtained by counting the number of correctly and incorrectly recognised 
utterances, categorized by which was the most productive and which the least 
productive information sovurce. The PASS program STATSUM provides data from 
which these counts can easily be accunulated. Results obtained in that way 
are presented in Figure 13. 
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Figure 12, Classification of all Utterances and of Erroneously 
Recognized Utterances, by Category and Type 
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Figure 13. 



Counts of Utterances Correctly and Incorrectly Recognized, 
Categorized by Most Productive (Best) and Least Productive 
(Worst) Information Source. Test data. Category 1, Types 
(0,1), (1,0) and (1,1). 
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GMtTZCil. tXMOIIATXON OF ZNFOKMHTXOM SOUKiCt NOEeXA 

In th« daclslon th«ortttic nodal of th« probUm solvad by MZNT« each 
infbraation sourc« it contld«r«d to provide on« coapon«nt of a ooqpltx obaar- 
vation of tha eharactarlttict of th« uttaranca. Solution of tha problan than 
raats on aatinating tha probability that tha particular obaarv«d valua wotild 
ariaa, ^van varioua hypothasaa about what waa actually aaid. In tha MINT 
inplanantation of thia aolution, tha obaarvad valuaa muat ba uaad as a basis 
for aatimating tha logarithm of tha likalihood ratioi i.a., tha logarithm 
of tha ratio of tha conditional probability that tha obaarvad valua would 
occur » givan that tha potantlal racognition is a raal ona« to tha conditional 
probability that tha obsarvad valua %^uld occur, givan tha potential racog- 
niUon is an artifact. (AQ^ aa definad earlier.) 

tha nachanlsm for converting an observed valua to an estimate of tha log 
likelihood ratio entails a statistical models specifically, a pair of con- 
ditional distributions of the observable values, given they are dascriptiona 
of either real or artifactual recognitions. These statistical models contain 
distribution parameters which are estimated from interim Test data, using 
procedures appropriate to the nature of the data and the statistical models. 
Recognition accuracy and theoretical soundness of the MINT algorithm both 
require that these statistical models and parameters must be reasonably d*** 
script! ve of the actual nature of speech data. 

Each information source presents its own difficulties for statistical 
modelling, but three issues can be identified which are of interest in assess- 
ing the validity of each model: 

a. The independent variables must be properly identified. 

b. If a distribution shape has been assumed, it must fairly describe 
the actual shape 

c. The model must describe statistical characteristics which do 
generalize from Interim Test data to new speech data. 

a priori MODEL. The decision theoretic model of the problem solved in MINT 
requires knowledge of the a priori probability that a particular hypothesis - 
in this context a string of vocabulary items which potentially may have been 
said - will arise, ihis a priori probability is the probability unconditioned 
by any observation about the acoustic data as received by the preprocessor or 
operated \pon by MEX, except that the graph of the utterance admits of the 
atringj i.e., contains a path corresponding to the hypothesis. 

It is assumed that the probability that a hypothetical path is in fact the 
correct one, without consideration of any details of the individual potential 
recognitions comprising the path, or of their mutual tenqsoral relationship 
(beyond that reqxiired to make them constitute a path throuc^ the graph) dept^nds 
only on the vocabtilary items in the path. It is further assumed that the 
a priori probability of correctness of the entire path is the product of 
probabilities associated with each vocabulary item in the path. Finally, it 
is assumed that the probability that a particular recognition of a particular 
vocabulary item in a path is in fact real can be estimated from the relative 
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fr^qusncy of occurr«n<» of th*t vocabulazy ifm as a raal-vlc«-^ti factual 
potential recognition in a large body of »p««ch data. 

Several links in this chain of aasumptions are difficult or infjoaaible to 
justify on theoretical grounds, or even to test. In fact, the assumptions are 
rationalisations for the way the a prior' contribution to total cost is actu- 
ally conputed in MINT, which was chos< because it is plauiible and computable 
at small co»t in data storage and processing burden. However, the evaluation 
of the a priori information source presented earlier shows that this procedure 
results in a cost contribution %rt\ich is productive more often than it is 
comterproducUve. ihe whole chain of assun^Jtions is thus justified in that 
it Isads to a useful result. TMo features of the a priori statistical model 
which are amenable to test and verification are its dependenoe upon vocabulary 
item* and stability of the relative frequency of real and artif actual potential 
recognition for each vocabulary item type. To verify these aspects of the model, 
the relative frequency of occurrence of real and artifactual potential recogni- 
tions for each vocabulary item are coopared for Interim Test data and Test data 
in Figure 14. 

As Pigure 14 shows, there is considerable variation in the relative rate 
of occurrence of artifactual recognition for various vocabulary items, justi- 
fying the use 'of vocabulary item as an independent variable. More precisely, 
it is the form of the vocabulary item which is important and which is used as 
the independent variable, in the sense that some vocabulary items exist in an 
initial ftorm and a non- initial form, and different a priori staUstics are 
stored for the two forms. 

These data also show the stability of the artifact production rates, indi- 
cating that rates esUmated from Interim Test data remain valid for Test data, 
thus presumably for all new speech data. Of course, artifact production is 
dependent \^3on vocabulary content and frequency of occurrence of various 
vocabulary iteias in the corpus of spoken material. Since Test and Interim 
test data have identical vocabularies and incidence of vocabulary items, the 
generalization from Interim Test data to Test data is justified. However, if 
LISTEN were to be used with a set of utterances viherein each item did not 
occur a substantially eqxial fraction of the time, new a priori statistics 
should be derived from artifact occurrence rates. 

Data used in the analysis of the a priori statistical model are gathered 
using the PASS program STATSUM and printed using the DOGLEG option in LICVAT. 

VIOLATION CATEGORY MODEL. In the process of detecting the potential presence 
of a vocabulary item in the speech stream, HEX notes several types of devia- 
tions of the speech from the structure expected of that vocabulary item. 
Eight types of structtiral violation are recognized, and they are described in 
detail in Reference l. Each type of structural violation is assigned a vio- 
lation category nvanber, ranging from 1 through 8. Violation category 0 indi- 
cates that no structural violation was detected by MEX. 

Violation category (0 through 8) is regarded in MINT as an observed 
characteristic of the potential recognition, and the probability of occur- 
rence of a given violation category is modeled as depending only upon violation 
category and whether the recognition is real or artifact. Dependence upon 
vocabulary item is suppressed, primarily because very few examples of some 
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(^Vocabulary item does lot exist in this form for this speaker) 

Figure 14. Artifact Production Rates. (The number of artifactual 
recognitions divided by the number of times the 
vocabulary item was spoken «) 
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ji>Al>tion oit*gari«s m ebMmd tot mom voi^abuXaxy itmm$ maktxkg it i«pes-^ 

l'136ae to «s the rate o£ occurrence oi these violations for 

real recognitions* 

Vocabulary Item Dependence for Artifactu&l Recognitions. The low rate of 
occumnoe of vloiatxone for real recoornitions makes it impractical to esti« 
mate its vocabulary item dependence with a reasonable sample size. However, 
arti factual recognitions exhibit violations much more frequently > and the 
vocabulary item dependence of their frequency can be estimated. It might* 
therefore* be both practical and useful to use a model of violation occurrences 
which treats violation aa independent of vocabulary item for real recognition* 
and dependent upon vocabulary item for arti factual recognitions. 

To examine this potential improvement of the violation category model* 
the variation of the frequency of occurrence of artifactual violation catr*- 
gories with vocabulary item was evaluated, as shown in Figure 15* To prepare 
that figure, the conditional probability that a given violation category would 
occur, giv«=:n that the recognition is artifactual and of a given vocabxilarj 
item and form* was estimated using the frequency of that occurrence. Ihe maxi-- 
mum and minimxmi values of the probabilities estimated ik that way, over all 
vocabulary items* are shown in tiae figure. Ihe average probability of occur- 
rence of each violation category (for all artifactual recognitions) * found by 
ignoring the vocabulary item dependence* is also shown there* 

The data in Figure IS, collected using the DOGLEG option in PASS program 
LICVAT* show that for > any violation categories there ^.s a significant vocabu- 
lary item dependence in the frequency of occurrence. Therefore* extensions of 
this model to include vocabulary item as an independent variable heui definite 
potential to increase the effectiveness of this information source • 

Stability . The stability of the rate of occurrence of violation categories 
was evaluated by corcparing the frequency of occurrence of violations in Interim 
Test data with Uieir frequency in Test data. The results* also collected using 
the DOGLEG option in LICVAT, are shown in Figure ^€>. The data show that viola- 
tion occurrence rates can be estimated safely using Interim Test data* for 
both real ajid artifactual recognitions. 

INTRA-WORD TIMING MODEL. During the recognition process* MEX notes the tin^ 
spent in each state of the recognition automaton* A measure of how typical 
the loop state durations are is accumulated as a linear combination of the 
time spent m each loop state. The resulting value is denoted QL, and is 
treated m ^4INT as an observation associated with the po-cential recognition. 
QL is a non-negative number. The coefficients of the linear form used in 
conputmq QL are obtained from TraiJiing data, and the computational proced^ore 
{.forming the linear combination) is based on a model of the ^oint distribution 
of t}\e i:. :xvi i'u-il l-^c-f- state iut <ii t i- , as ies^ rLbe^* m Keference U ;^*NT 
Itself uses a model of the distribution or .^L values whicl'. :s quite independent 
of the model uj^on whicn tlie computation of 13 based. 

In HINT :s assnW'i ^h-At vl i.\ s r 1 t*\it /^.i extH;nf>r\ t la I 1 v >vm i.-ositiv^^ 

vAl.i#*s, w; a .*nass 'oncent ra t x« 'M rjt lu a wav whic'«; depends upon tlu- 

\X)cabularv 1 •^om ^vt^» ^nd toir^^, ai.'i whet>.ei * i.'^:t ont lai t^Mi'f^a^^itit)!^ 13 rt^\l 
or amfa-'»-2A-^ ^ TVu? palame^e^.^ : rhp distr ib ion itiie probability t^iat tli^ 
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Figure 15. The Range of the Conditional Probability of Occurance 
of Violation Categories for Artifactual Recognition 
Over Vocaibulary Items. (Test data used in both cases.) 
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tmmt data* 



cns^itflBrutlon o£ QL valuea observed over Interim 



If QL is in fact distributed in the modified exponential manner aasumed, 
the computation of the log likelihood rati^o .p^rforaed in KUIT i« accurate. It 
i» therefore of interest to determine the validity of this assui^Jtion. 

AS the mass concentration at zero and the parameter of the esqponential 
part of the QL distribution are observed to be vocabulary item dspendent, it 
is desirable to normalize observed QL distributions with respect to these 
parameters in order to avoid detailed consideration of two dosen distributions 
for each speaker. Por this reason, the QL distributions have been linearised 
as described in the following paragraph. 

A large set of independent samples of a random variable, distributed as 
QL is assumed to be distributed, can be converted to a set of numbers which 
are approximately uniformly distributed in the interval (0,1). ito do this, 
first put the QL values in increasing order and assign running index 
i - 1....N to these values. For each i, replace the ith ql value (QL^) by 



f. » 

X 



X 



if QL^ - 0 



1-(1-P )e if QL. > 0 



where 

is the probability that QL * 0 
A is the parameter of the exponential portion of the QL distribution 

If and ^ are in fact the correct parameters a£ the QL distribution , 
and if QL has the assumed distribution shape i the resulting set of nmttoers 
approach a uniform distribution on (0>1) for large N. Correctness of the 
assumed distribution shape i and of the parameters p^ and X can then be checked 
by plotting the cimulative distribution of the fi values. If the distribution 
shape and parameters are correct, a straight line will result. Moxre impor- 
tantly, sets of QL values for different vocabulary items and for real and arti- 
factual recognitions can be converted to sets of f values using the parameters 
appropriate to each set, and the sets of f values can be merged. The resulting 
large set of date* will be uniformly distributed on (0,1) if the model and the pa- 
rameters for individual vocabulary items and types of recognition are correct. A 
single graph thus checks the model and parameter validity for the whole vocabulary. 

The PASS program STATSUM performs the conversion of QL values to f values 
just described, and the QLPLOT function in LICVAT generates a computer graph 
of the c^jmulative distribution of the merged f value sets. Results obtained 
in this way are presented in Figures 17 and 18* 
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9m §tmfh of th% euMuXativi distributien fev i dtvi*t^ ti^iifieantly 
from a straight line for spaakar MNG^ but is quita raasonably stzmight for 
apaakar Lm. This is probably dua to dULffarant procaduras usad to ganarata 
tha axponantial paranatars (X) for aach vocabulary itaa for tha two spaakars. 
For MWG, tha X valuaa wara astimatad by visually comparing cosputar plottad 
cuBulativa QL distributions with asqponantial curvaa of known parasMtar. lha 
distortion notad in tha vtppmr right hai^d portion of MWG's f distributions ara 
%*hat would ba axpactad if thara wara a aystamatic bias towards astinating too 
high a valiia for X. iha fact that the lower left portions of those curvas 
tend to be straight lines coincident with the graph diagonal indicates that 
the Pq values are correct. They wara as tinatad as tha fraction of observed 
zero values and ware thus not subject to human error as were the X estinates. 
In contrast, both X and estimates were derived objectively for LKM's voicei 
using programs in the VDGS. 

The difference just noted between the two speakers* data suggests that 
the QL information source could be inproved for MWG by re-estimating the X 
parameters for each vocabulary item, using the unbiased mathematical procedure. 

Uiese graphs indicate that the assumed exponential shape fits the distri- 
bution of non-zero QL values quite well. If the data were distributed in some 
other way, with the parameter X chosen to obtain best fit to the data, the 
curves would have an ogival shape rising above and falling below the graph 
diagonal in the upper right hand portion of the graph. The graphs also indicate 
that the parameters obtained from Interim Test data are descriptive of other 
speech data, as indicated by the similarity of the curves for Interim Test and 
Test data. Thus the QL statistical model appears stable. 

ASSOCIATION MODEL. In an effort to exploit the fact that speaking certain 
vocabulary items may have a tendency to cause artifactual recognition of 
another vocabulary item, MINT detects and uses the temporal association of 
potential recognitions. If there is significant asymmetry in the rates of 
artifact production (for example, if speaking ••five" usually causes artifactxial 
recognition of "nine," while speaking "nine" seldom produces artifactual recog- 
nition of "five") association may carry information \2seful for recognition. 

A set of associated vocabulary items and forms is ascribed to each poten- 
tial recognition for this purpose. A vocabulary item is associated with a 
given potential recognition if there is another potential recognition of that 
vocabulary item tvu which overlaps sufficiently in time. The required amount 
of overlap (called the association criterion) is determined as described in 
Reference 2. Only the existence or non-existence of associated recognitions 
of each vocabulary item is noted, not their number. 

The probability that a potential recognition will have an associated 
^tential recognition of given vocabulary type is assumed to depend upon both 
vocabulary items and forms, and whether the former recognition is real or 
artifactual. 

the PASS program STATSUM tallies the number of times each vocabulary item 
and form is found to be associated with real and artifactual recognitions of 
i^ach vocadoulary item and form and tho LI>-:VAT option DOGLEG prints these data. 
Hhe probability that a real (or artifactual) recognition of given vocabulary 




HAVTRAEQUIPCBN 

itMK «nd fom will hav« an Mtoeiattd Mcognition ot qivn vocabulary itmm and 
fom can be attimatad dlractly from thasa talllaa. Tha association data can 
contributa to corract recognition when the probabilities for real and arti- 
factual recognition differ significantly. The natural maaaure of this poten- 
tial is the likelihood ratio* 

The assumed dependence \jpon vocabulary item of the associated recognition 
is shown to be factual by the data in Figure 19. These data show the esti- 
mated probability that various vocabulary items and forms will be associated 
with real and artif actual recognitions of the word "five/' as determined for 
MWG test data. The likelihood ratio is seen to vary widely from xanity. How- 
ever # if one averages over vocabulary items # it is found that both real and 
arti factual fives have the same probability (.21) of having an associated 
recognition of unspecified type, resulting in a likelihood ratio of one, and 
no information for distingviishing real from artifactual recognitions • Similar 
results can be demonstrated for vocabulary items other than "five." Including 
vocabulary item dependence in the association model is, therefore, necessary 
in order to extract the available information. 

The data of Figure 19 also show that association of a potential recog- 
nition of "five** with another recognition of any vocabulary item other than 
"nine" yields information useful in distinguishing real from artifactual rec- 
ognitions* The exception is xmfortunate, as "five'V^nine" discrimination is 
difficult. 

Itie stability of association statistics can be demonstrated by comparing 
association frequencies observed for Interim test and Test data. Ihese fre- 
quencies are shown in Figure 20 for LHN's enunciations of "point." The fre- 
quency observed in Test data is plotted against the frequency observed in 
Interim Test data to facilitate con^^arison. 

INTERWORD TIMING MODELS. MINT uses three different models related to the 
relative time of occurrence of potential recognitions within an utterance. 
These three models treat the delay between the start of the utterance (sound 
detected by the preprocessor) and the beginning of the first recognition, the 
gap or overlap between successive words of the utteranee, and the delay between 
the recognition of the last word of the utterance and the cessation of sound* 

Initial Delay Model , The delay between the beginning of the utterance and the 
start time of the recognition of the first word in the utterance is assumed to 
be distributed exponentially over positive valuas, with a mass concentration 
at zero. (This distribution was suggested by examining many cases during 
LISTEN'S development.) The probability of a zero value and the parameter of 
the exponential portion of the distribution are assumed to be dependent upon 
vocabulary item and wh<>ther the recognition, is really the first word spoken in 
the utterance or not. (Thus for the initial delay model, recognition of the 
second word actually spoken is an artifactual recognition of the first word 
spoken • ) 

Variation of the distribution parameters with vocabulary item, and sta- 
bility of these statistics, are revea.led by comparing estimates of the param- 
eters derived from Interim Test data with estimates taken from Test data. Data 
for computing these estimates are provided by the PASS program STATSUM, and 
printed by the GAP DATA option in LICVAT* 
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Figure 19, Estimated Probabilities and Likelihood Ratios that Potential Recognitions of Various 
Types Will Be Associated with Real and Artifactual Recognitions of the Word "Five*** 
MWG Test Data. 
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Fraquttncy of ABsociation Obsarvod in Intarim T«st Data 

Comparison of the Frequency with which Various Vocabulary Items 
Are Associated with Recognition of the Word '•Point" in Interim 
Test and Test Data for Speaker LHM. 
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Flgur* 21 shotfs tte fr*qu0ncy with which zsro initial delay was obsarvad 
for all vocabulary itama« in Intariai Taat data and Taat data. Tha wida vari- 
ation of thasa ratioa for various vocabulary itana indicataa tha inportanca 
of uaing vocabulary itan aa an Indapandant variabla. This figura alao ahoira 
that tha variation in rata of oocurranoa batwaan vocabulary itaa» la graatar 
than tha variation fron Zntarim Taat to Taat data> furthar validating tha 
dapandanca upon vocabulary item, and ahowing tha stability of tha atatiatics. 
Tha wida disparity in fraquancy of zsro dalay obaarvad for real and artifactual 
recognition, and hanca tha utility of this information source > is ilso ap- 
parent in this figure. 

Figure 22 shows tha mean of the non-zero initial delay obaarvad in Zntarim 
Teat and Teat data for all vocabulary items. The reciprocal of this value is 
an unbiased estimator of the exponential distribution parameter. The time 
unit is one "count, the period of the interruqat signal from the preprocessor, 
which is af^roximately t*#o milliseconds. These data indicate several inter- 
esting characteristics of the non-zero initial delay distributions. 

First, non-zero initial delays are very much larger for artifactual than 
for real recognitions. The only exceptions to this rule are vocabulary items 
of initial form; for those vocabulary items, the non-zero initial delays for 
real and artifactual recognitions are coitqaarable . (Thia is because potential 
recognition of the initial form of a vocabulary item is only allowed by MEX 
to start in the first fifty or so milliseconds of the utterance.) If the 
distribution of non-zero initial delays is in fact exfxjnential , this indicates 
that artifact non-zero initial delays are distributed essentially uniformly 
in the interval where there is any reasonable probability of a delay being 
due to a real recognition. 

Second, among non-initial artifactual vocabulary items, the variation of 
mean non-zero initial delay with vocabulary item is not a large fraction of 
the average value, and comparable to the variability betwen Interim Test and 
Test data. Combining this fact with the first obse*-vation, it appears that 
the initial delay model could be simplified by assximing non-initial delays for 
artifactual recognitions are distributed uniformly over the region of interest, 
with a density which is independent of vocabulary items. Prom a computational 
point of view, however, it turns out to be simpler to retain the assumption 
that the distribution is e3qx>nential rather than uniform, but with a distri- 
bution parameter which is ixidependent of vocabulary item. 

third, the stability of the non-zero initial delay distribution for real 
recognition and artifactual recognition of initial form is suspect. This is 
almost certainly a problem of sautple size, as several vocabulary items have 
high probability of zero initial delay, leading to very few cases of non-zero 
delay from which to estimate the mean. For example, in the corpus of utter- 
ances used in this project, each data set (Training, Interim Test and Test) 
contains thirty occurrences of each vociibulary item in the initial position 
(including the six cases where the item is spoken in isolation) . If the 
probability of zero initial delay is 0.8, the expected non-zero delay sample 
size is six. An extended study in this area might reveal an appropriate 
simplification of this portion of the initial delay model as well. 
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Figure 21. Graphs Showing Frequency of Zero Initial Delay. 
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rinal D»Xay Model . The interval between the time of recognition of the last 
word spoken in an utterance and the preprocessor's detection of the cessati-^n 
of speech is assumed to be distributed exponentially, with distribution param- 
eter depending upon vocaOJulary items and whether or not a potential recognition 
is really the last «ford spoken. {1h\x» "artifactual last words" include all 
artifactual recognitions and all real recognitions of words other than the 
last.) The PASS program STATSUM accumulates end delay values and averages 
those data for each vocabulary item and recognition type and the GAP DATA 
option of LICVAT prints the results. These data are shown in Figure 23 for 
all vocabulary items. (The reciprocal of the average delay is an unbiased 
estimator of the exponential distribution parameter.) Two different scale 
factors have been used in this fig\ire to increase visUaility of certain fea- 
tures of the data. The unit of time used is one "count", about two milli- 
seconds. 

These data show that the final delay has significant vocabulary item 
dependence, and that the variation with vocetbulary items is considerably 
larger than the variation from Interim Test data to Test data, for both real 
and aortifactual recognitions. Therefore, unl^-Jce the initial delay model, the 
final delay model cannot be simplified by suppressing vocabulary item depend- 
ence without sacrificing information. 

Interword Gap Model . The time interval between the end (recognition time) of 
one potential recognition and the beginning (start time) of another is assumed 
to be distributed in a symiaetric limited exponential manner. That is, the 
probability density, as a function of the interword gap g is assumed to be of 
the form: 



_1_ 
4d 

JL. 
4d 



if Ig-ui d 



d 



1- 

e " if Ig-ul > d 



where u and d are paran«ters of the distribution. These parameters are assxmied 
to depend upon the vocabulary item and form of the two potential recognitions, 
and on whether they are really recognitions of contiguous spoken words taken 
in correct order, or otherwise. The time interval between two potential recog- 
nitions is thus considered artifactual if the first is treated in MINT as a 
potential predecessor of the second, but they are not both real recognitions 
of contiguously spoken words. 

With a dozen vocabulary items, this model requires considering a gross .of 
vocabulary item pairs. Since each Magic Number Set of (55) utterances con- 
tains each sequential pair of vocabulary items exactly once, ("point-point" 
was excluded). Training, Interim Test and Test data sets contain six examples 
of each interword qap distinquished by the model. Statistical sample size is 
thus a serious problem in estimating tJhe distribution parameters u and d for 
each pair of vocabulary items. 
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The sffwll number of avail^ibl^ interword gap siunples also makes it diffi- 
cult to valKlat*? treattirjq vocabulary items as an independeot variable in the 
interword qap model. Some justification for considering vocabialary items in 
modeilinq real interword ^^aps can be taken from the fact that data tend to 
have certain trends which would be expected on a phonological basis. For 
example, in tho **six-six** case, one expects overlap due to the identical termi- 
nal and initial sound of the words involved. Similarly, one expects overlap 
of word pairs which share a stop, such as '•eight--two • " Word pairs which entail 
dissimilar so'inds at their juncture, such as "seven-point" are expected to 
have larqer than average interword gaps. Ttie observed mean values of inter- 
word gaf^^ suojoctively at least, seem to exhibit many of these anticipated 
tendencies, as, is demonstrated in Figure 24. Hiese data were obtained by the 
VDGS program '^PSTER, for speaker MWG's Interim Test data. 

Ir 1^ muoh l^'ss likelv tlvat vocabulary item dependence should be consid- 
er *.» i I r. the lib t.r itut loji jt artifact interword gaps, since the phonological 
ir lumeri^. . a:-.:- .^t apt^ied to artifactual recognitions or non-contiguous real 
r**co'7n: t vj . Litr.it/ wtjuld probablv be lost by simplifying the model by sup- 
Kres.sirv^ this d*^pendenc-"^ , but it is impossible to demonstrate that as fact 
w I th .iva ) 1 djLD ia - 

Iri ,.^r. .if.empt to evaiuar.e the stability of interword gap statistic., and 
vAiiJ:*"v t ♦'ho assiimed distribution shape, the following procedure was used 

r-crrncil iZf .nt.erword ja^ data, A derived random variable, f> can be com- 
r^]tttd t rx?n* *-;.•* h'S-'rV'-i r •iiidom 'i.i: values > usmq the known parameters of 
t:;e iis*:ri.. .tion. If ?' i:;, roiat»*i to g by 



X ; ! : 



wtit»i. } \ *:,r i» ;f*r\b;*:v assunvni tor the qaj. data, f will be uni- 

f u'ir,.\ \i •♦^.:>M!-: .-) i^'^'^i^^d the assumed distribution is correct. By 

un;ni i i -»-r i : : : ar .imf'tfr .. ^\nd .1 appropriate to the instance of g, all 
\\ 11-. tj. L.. \:\ ).t> nr^t-r 7<pd and their cumulative distribution plotted. 

:\i as;im'»-* i: i iLi' l- \^haf,H and parainot^^rs are correct, a straight line 



• 1 r 



; A. ; • »: MT TA^^^■M t.m; utrs the n<>rrrv9 i i zmu function f and the 

I • ^ .p;, T i! ; r iqrjim MTVAT generate? a computer plot of the cumulative 

li-3tr\: .* . ' r * -r wh t.^k*v. from the cc^mfnit*?r plots are presented in 

r, ; a I r.»-f»* t"d* :r i thes*? data ii*-* discussed m the fv>l!ow- 

*^ • - • : .• * ! • • ia'.i ^-:v>w * i r. r.o*- uim I ami y «ii st.ri nat<^:i 

• •••• • •. . :* ♦ r.»» « it; a. : -i c arv* r *^ r -iSf'd .p. • *om;.iiit, i nq t '"ir*- 

*a-»-'- *r * • i; • : * r:*' ir-.'^j • ar: h*^ *?xp 1 i i :i»-*vi bv ti'ie ta.» 

. . ^ . . . ..... . , i : ♦ r ; a ? i *. i * . f ; m a xi ? ♦* "-^ 



:RJC 



0 



IAVTNUBQUZ9CIH 78-C-0141:»1 



Leadi ng 
Word 



Following Word 





0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


• 


0 


36 


27 


37 


40 


72 


72 


49 


40 


61 


36 


56 


1 


3 


12 


1 


27 


41 


56 


19 


12 


20 


13 


33 


2 


9 


47 


22 


20 


66 


68 


17 


37 


42 


35 


60 


3 


56 


54 


42 


60 


SO 


86 


60 


53 


63 


75 


68 


4 


25 


11 


20 


36 


39 


58 


37 


17 


8 


13 


24 


5 


-11 


25 


19 


16 


21 


30 


- 2 


- 1 


9 


22 


34 


6 


- 3 


35 


29 


23 


28 


46 


- 2 


- 8 


19 


46 


27 


7 


29 


25 


21 


21 


41 


54 


21 


29 


-27 


- 5 


40 


8 


13 


23 


-17 


23 


21 


40 


29 


7 


12 


38 


24 


9 


28 


30 


26 


44 


39 


67 


33 


31 


23 


38 


41 




23 


18 


20 


23 


35 


50 


19 


19 


18 


J3 





Figure 24. Mean Interval Between End of Recognition of 
One Word and Beginning of Recognition of 
Succeeding Word, Speaker MWG 
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of samples are availaible. As a result, the parameters for several vocabulary 
itam pairs describe a distribution which is wider than the data indicatei 
leading to fewer than expected f values near zero and one. Another factor 
which may be contributing to the paucity of real gap cases at the extremes 
would hm that Uie distribution shape has too much weight in the exponential 
portions, vico the uniform portion ♦ (The adopted distribution has twenty-five 
percent of its mass in each exponential seg^T-^nt,) No reasonable explanation 
is available for the greater deviation of the f distribution from linearity 
observed for speaUcar MWG than for speaker LHN. 

The distribution of real gaps is very stable with respect to spe ch 
saitple, as can be seen by comparing the f distribution for real gaps s^own fox 
Interim Test and Tej:>t data. This s tai')ili.ty > and the similarity of the dis- 
tributions obtained for zho two speaker suggests that substantial improve- 
ment in the modelling of reaJ gaps can be obtained by reducing the small 
sample protection bias toward -\arge d values, and perhaps changing the assumed 
distribution shape by reducing the mass in the exponential 5X>rtions. 

The f distri»^ution graphs for artifact gaps show a deviation from l.'.near- 
ity which is the reverse of that observed for real gaps. In each case* more 
than the expected number of f values are found near zero and one, and fewer 
near middle values. This is the result to be expected when the gaps are 
actually distributed more or less uniformly over a broad interval, including 
values where the density is modelled as decreasing exponentially. It is a 
clear indication that the assumed distribution shape is not appropriate for 
artifact gaps. As the distribution width (indicated by the parameter d) is 
much greater for artifacts than for real gaps, a superior model for artifact 
gaps would result from assuming that artifact gaps are uniformly distributed 
over an interval containing almost all real gaps. The almost linear portion 
of the f distribution near middle f corresponds to time values c^.^vering the 
regaon of interest for real gaps, so this linearity indicates that tho locally 
uniform assumption is a good one - 

Stability of the gap statistics is also indicated by the simil6u:ity of 
the artifact f distribution for Interim Test and Test data. This is another 
indication that improvement m the ar':ifact gap model may significantly iitprove 
use of the gap information source. 

This analysis reveals a tendency to underestimate real gap densities, and 
overestimate artifact qap densities, at middle f values. This results in a 
considerable underestimation of the likelihood ratio, and too little cost 
advantage being assigned for gaps observed in this region. For extreme f 
valutas, the density of real gaps is overestimated and the density of artifact 
gaps is underestimated, leading to overestimation of the likelihood ratio. 
As a result, f-^xtremely short d long gaps are not penalized by high cost to 
the extent th»>y should bo. 'V\\e net effect of ti-iese model inadequacies is to 
imderemphas I the gap information source by assigning costs which partially 
mask the true signi f i canci* of t-ypicral and atypical gaps alike. This is a very 
interesting^ result in view of the fact t-hat gap data are an important part of 
t\)a mterwoni tim.ina ml ormat iLin sourco, and this information source has been 
found to tMe most ;:roductivp j. n forma t lot*. source used m LISTEN. Improve-* 
ment: of Uu» Uv^n model would thf»n 5ieem ♦'.o off^-^r sianificant potential for 
improving I,U^TEN*s pettormanr? . 
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SECTION V 



SUMMARY OF RESULTS AMD COMCLUSIOMS 



SUMMARY OF RESULTS 

The VIAS project i» a continuation of the NAVTRAEQUIPCEN*« exploratory 
development program for automated kpeech technology. It has contributed to 
that program by developing a working ayttem suitable for laboratory concept 
development in the area of limited connected speech recognition which is 
readily modified for research purposes. This system permits the variation of 
parameters and evaluation and analysis of effects upon recognition results. 
Consideration has been given to increasing the number of speakers, automating 
the process of reference pattern creation, expemding vocabulary size, and 
transferring technology to a new preprocessor, all within the context of real- 
time recognition. 

Specific results achieved by this project are summarized below, 

TRANSFER OF TECHNOLOGY. The real-time-connected speech recognition system 
LISTEN has beeo modified to operate successfully with a new model of speech 
preprocessor . 

EXTENSION TO NEW SPEAKERS. It has been demonstrated that LISTEN can achieve 
cotmected speech recognition accuracies in excess of ninety percent (word 
basis) for a new speaker. 

EXAMPLE SET GENERATION. The importance of the method of generating sets of 
individual vocabulary items used in creating voice reference data has been 
demonstrated. 

VOICE DATA GENERATION SYSTEM (VDGS) . A unified body of con^juter programs for 
(generating voice reference data has been developed. These programs automate 
the voice reference data creation process to the full extent practicable at 
this time. These programs oxist in two forms: as an almost autoncxnous se- 
quence of programs requiring an absolute minimum of human intervention, and 
as a collection of individual programs which can be exercised independently 
for research purposes. A detailed users manual has been provided for both 
versions of Uix» :iystem of programming. 

PERFORMANCE ANALYSIS SUBSYSTEM (PASS). A useful, powerful » and convenient set 
of programs has been developed and exercised for analyzing the overall per- 
formance and many technical details of LISTEN'S operation. A users mamual 
\lso has been provided for using these programs. 

VOCABULARY EXPANSION. The number of vocabulary items which can be accommodated 
by various VDGS programs has been increased toward the desired goal of thirty. 
The qoal has been reached for several of those proqrams, and no fundamental 
barrier exists to reaching it for all programs. 
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ANALYSES or LISTEK PEratJRKANCS. Programs of the PASS have been ua«d to analyse 

the significance of the several information sources LISTEN uses to obtain rec- 
ognition. It has been found that these information sovwces vary considerably 
in their utility for recognition. Methods of automatically classifying and 
analysing recognition errors have been developed and used. Among the many find- 
ings, it has been shown that most recognition errors result from failure to 
correctly select the correct alternative in a simple substitution decision. 
The statistical models used to represent the information sources have been 
examined critically with a variety of results. While the models have generally 
been shown to be effective, several specific modifications to sin?)lify data 
collection or improve model fidelity (and recognition accuracy) have been 
suggested . 

CONCLUSIONS 

Results obtained in the VIAS project support four conclusions of general 
interest, as discussed in toe following paragraphs. 

MAGNITUDE OF THE VOICE REFERENCE DATA GENERATION BURDEN. Producing the VDGS 
was a major task, due to the number and complexity of the procedures used to 
produce voice reference data needed by the LISTEN real-time recognition pro- 
grams. Using the VDGS to produce voice reference data for new speakers also 
reqtxires a considerable amount of computer time and labor. These facts have 
made clear the important role that reference data generation requirements may 
have in determining the practicality of applying a connected speech recognition 
capability in a training environment. 

LISTEN was developed with primary emphasis on real-time operation and ex- 
ploitation of all information which might be present ia the preprocessor out- 
put, and essentially no concern with the voice reference data production 
burden. Now that much has been learned about the nature of the information 
present in the preprocessor output, the opportunity exists to reformulate the 
recognition and reference data extraction processes in a way which will main- 
tain or improve recognition performance while minimizing the reference data 
production burden. 

INFORMATION IN THE PREPROCESSOR OUTPUT. Analyses performed using the PASS pro- 
grams have verified the presence, and elucidated the nature, of information 
sources in the preprocessor output. Models of those sources posited during 
LISTEN'S development have been validated to varying degrees, but the validity 
of the models is secondary in significance to the fact that those information 
sources have been isolated and demonstrated by objective means to be present 
and to have utility for recognizing connected speech. 

T!IE ANALYTIC APPROACH. Equally significant is the fact that the approach used 
in this project to evalute LISTEN'S performance has led to analytic procedures 
which reveal the character and relative value of different sources of informa- 
tion in a preprocessor's output. This approach is data intensive and costly 
in terms of computer processing requirements for developing and exercising the 



7(i 



MWVTMiQUXPCIN 78-C-0141-1 

•naly«i« progr««. b«t it it th« only approach which yialda concrata 
ISSirtSa S^SHuon .cure... A. d..«n.trat. a in ? J^^^r^tS 
too«la49a of tha in£or»ation aourca. can cLarly indicat. improvwant. in tha 
racognition of rafaranca data ganaration procadura.. 

poTBiTiAL Tha analysa. daacribad in thia raport indicata only tha potential 

to improve LISTEN* a racognition accuracy. 
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APPENDIX A 

voxel DATA GMBRATZON SYSTBM USIRS MANUAL 

A, 1 ONtltAL 

ni« objwt of this appendix it to d«acrlb« in mmm d«t«il thm mm of 
th« Volc« DRta Gtnmtlon Systm (VDGS), Specifically diaouss«d will b« 
what Is involvad in th« procass which bagina with tha axtracUon of voica 
aaaplas from tha spaakar and anda with tha craation of a MIND fila. Tha 
individual prograaa of VDOS alao will ba daacribad. l!h« main body of thia 
appandix daacribas two aathoda of using th« VDGS prograaa to prapara tha 
MIND fila which is nacassary for tha oparation of USTEH. 

Tha oparaiting anvironmant for which tha VDGS aoftwara has baen pra- 
parad is ona in vihich thara is available a Data Ganaral S-130 minicomputer 
running under RDOS with a Threshold TTI-SOO voice preprocessor and stand- 
ard peripheral devices. The executable files for each of the individual 
VDGS routines are intended to function on the S-130. Howevar, the VDGS 
software alao will operate on a Nova 3 minicomputer, provided all routines 
are reconciled and all programs reloaded. 

Program descriptions for VDGS routines are presented in A. 8. 

File descriptions for VDGS user-created files are presented in A. 9. 

Data files and compile and load macros are tabulated in A. 10. 

A. 2 THE TWO METHODS OF VDGS 

Before LISTEN can perform limited continuous apeech recognition of a 
givan speaker's voice it is necessary to construct a MIND file. A MIND 
file i3 a fila containing the concentrated statistical essence of a voice, 
and many routines (twenty-four) are required to create it. We will des- 
cribe two different methods for using these twenty-four routines to create 
the MIND fila. cne method, which we will refer to as the chain method, is 
to use one routine and two comoumd macros to execute all twenty-four rou- 
tines with operator intervention required at only one point. The other 
approach, which we wHl call the step-through method, requires an operator 
to execute each program separately and engage interactively with the prog- 
rams. The chain method has some limitations which will be described 
later, but it essentially runs by itself, ttie step-through method is more 
flexible, but it requires relatively extensive operator input. Whichever 
method is chosen, it must be followed through to the creation of the MIND 
file. In the sequel we will describe both methods for using the LCSR sta- 
tistical preprocessing package. 

A. 3 THE VDGS CHAIN 

INTRODUCTION TO CHAINMIND 

The approach to using the VDGS software which is simplest, in the 
sense of requiring the least input from the operator, is embodied in 
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CHAINMIND - the VDGS chain. CHAINMIND has three parts: (1) ths extrac- 
tion and compression of voice samples, <2) the creation of example spaces 
and transition letter sets, and then (3) all the rest of statistics gath- 
ering and statistical processing, including the building of the MIND file* 

The first part of CHAINMIND is accomplished by the program EXTRACT 
which prompts the user to speak, extracts raw voice data from the TTI-500 
preprocessor and compresses the data to a fonn usable by the remaining 
VDGS routines • The operation of EXTRACT is described later and some fur- 
ther comments about its use are included in A#4« 

The second part of CHAINMIND is GEIWL, a small chain consisting of 
the programs ESG and GZEC* ESQ creates eleven example spaces, one for 
each of eleven vocabulary items i GZBC creates transition letter sets, also 
one for each item. The operation of GENTL is also described in detail 
later* 

I 

The third part of CHAINMIND is MAKEMIND, a chain of the remaining 21 
VDGS progreuns# 

USE or CHAINMIND 

To begin CHAINMIND, first use EXTRACT to create compressed data files 
for all utterances in eighteen magic numbers sets, MNSETA throuc^hi MNSETR* 
This can be done over a period of time at the user*s convenience. Probably 
not more than six magic number sets at the most (310 utterances) should be 
spoken at a sitting* 

After all eighteen magic number sets of utterances have been spoken, 
the chain GENTL cam be run. To do this make sure that the files ZESG.SV 
and ZGZEC.SV are on the speakr»r's directory (this directory should have a 
three letter name, and should hold all the compressed data files) as well 
as the data files PPILK and WIZ.ST and the command file GENTL* Having 
done that, type <aGENTL@# and the example spaces ES$XXX$** and temporary 
transition letter set files TRLS**.TM and TRIX**.TM will be created. 

When GENTL ts finished, the user must intervene to pick the best 
transition letter set for each item. The procedure for choosing the best 
transition letter sets is described later when ^he program RESCUE is dis- 
cussed. When the best transition letter sets have been determined and 
their ••RESCUE indices'' found, the user should create a file called REDEEM 
with the editor. The user then should enter into the REDEEM file the 
eleven RESCUE indices, in order from the first vocabulary item to the 
eleventh, one per line, in 12 format. 

Once the file REDEEM has been created, the third section of CHAINMIND 
can be run. Again, all the MAKEMIND executable files must exist on the 
speaker's subdirectory, together with all the compressed data files, the 
magic number set files, and the file REDEEM. Then the user must create a 
file called WHERE with a single entry of the form ♦•disk unit: subdirec- 
tory'* indicating where the speaker's counter data files will reside - e.g. 
DP2sUSG, To continue, type ^MAKEMIMD^, and (after 20-25 hours) the 
MIND.VD file is created^ 
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In fuiMiAryr th« voic« tachniclan •up«irvt«ing th« operation of 
CHAZNMXMD should procMd like thl*t 

«• Hmk9 sur* all the CHMNHZNO routines and data files exist on the 
speaker's subdirectory* (See Table A1) ^ 

b. Run extract" to create the compressed data files 

c. Run ^tVJUA to create example spaces and transition letter sets 

d« Pick the best transition letter sets and create the REDEEM file. 
Also create the WHERE file. 

e. Run ^MAKEMIND® to execute the remaining VDGS routines and create 
the MIND file. 

SOME COMMENTS AND CAVEATS 

The CHAINMINO method of VDGS processing has some rigidities and 
limitations which must be pointed out. 

a. CHAINMIND is limited to the use of eleven machines and does not 
allow the option of creating universal machines for the special handling 
of initial 'words in an utterance. ' 

b. The magic number sets to be used for training, interim test, and 
test data are fixed in CHAINMIND. The sets MNSETA through MNSETF are used 
for training data, MNSETG through MNSETL for interim test data, and MNSETM 
through MNSETR for final test data. 

c. Some examples in the ESG-created example spaces may be too long 
for processing by GZEC and LCXipER, and these examples will be ignored. 

d. There is no way for the user to intervene and remove special bad 
cases in the counter data file CDAT.RV created by REVEXA and REVEX. This 
mostly has the effect of increasing the false alarm rate later on in the 
other prograuns. 

^, p^yv, significantly, the CHAINMIND method has a poorer 

facility for recovering from ^U>no^^al or error situations than the step- 
by-step Approach. This means that there are abnormal situations with 
which CHAINMIWO cannot cope and will crash. 

A. 4 THE VCGS STAND-ALONE VERSION 

The other- method of using the VDGS programs is a 3tep-by-step inter- 
-ictive procedure wherein the user executes each program in turn, responds 
rn its promota, and examin*?;? its output as necessary. A list of the VDGS 
programs in the order ir which they are to executed appears m Table 
M. \ description of this stf'p-by-step approach follows. 

This approach has sone obvious advantages over the CHAINMIND ap- 
proach. First of all, the programs can be run individually in relatively 
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TIVBLE A1« Vtx» Frogramf in Ord«r of ffxaeution' 



1. 


EXTRACT 


2. 


ES6 


3. 


GZBC 


4. 


RESCUE 


5. 


SIGH 


6. 


I/)OPER 


7. 


REVEXA 


8. 


RVDIT 


9. 


COVERT 


10. 


INVERT 


11. 


CROAK 


12. 


REVEX 


13. 


RVDIT 


14. 


CROAK 


15, 


ADDER 


16. 


AVRAJ 


17. 


CRAP 


18. 


GAPSTER 


19. 


SORTRA 


20. 


SORTRB 


21. 


GAPSTER 


22. 


MOTE 


23. 


GLOVE 


24. 


TAILOR 


25. 


BUILDER 


26. 


DEALER 


27. 


PHEW 



NOTE: Throe routines, RVDIT, CROAK, and GAPSTER, are run at two different 
stages. 
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maXI blocks of computer time And do not require a 20-*hour block as does 
HMCBMINO. Secondlyr error situations and abnormal conditions are much 
moxm easily reaponded to than In the CUAINMINO approach^ If an error 
occurs In the step- by-step apprpach, ono need only back up a step or two 
and restart* TVlrdly, as will be seen in the sequel, the step-by-step 
approach offers a deqree of flexibility not available in CHAINMIND* 

With these observations in mind, let us consider the VDGS prograjns in 
their order of execution. 



EXTRACT 



The process of voice data extraction begins with the collection of 
vUc** samples by the proqram EXTRACT. ^or each utterance EXTRACT creates 
\ compressed data ^ilet f-^CD), and an optional raw data file (-•RD), all 
r-e while mamtaininq a listinq file fEXTOUT^LS) if desired. It is the 
s^t of compressed data files tnat is used in the remainder of the voice 
iatrt generation procedure. 

Probably th** simplest way to collect voices iata samples i« to proceed 
.15; roilows: Thke a disk which has been formatted anJ! initialized and 
vrfhi :h is substantially empty. Create a subdirectory with a tiiiree- letter- 
lon':j name, and copy onto this subdirectory the set of magic number seta 
MNSKT* which are to be used, as well as the file EXTRACT^SV* This sub- 
ilrectory will then hold all the -.CD files and any ••RD and listinq files 
-raated by EXTRACT. Vne separate description of the program EXTRACT tells 
now tr> proceed from here. But some additional comments are in order* 

1. T^e r<^om where the voice extraction is to be done should/ of 
• wtixse, be kept as quiet as possible to avoid excessive noise in the voice 
si'^nal, TTie volume adjustment should be set so that the meter registers 
.\..out •).^ when the word "five" is spoken. The microphone headset should 
^e ^' :s^<->1 so tha it comfortable ajid so that th^ microphone its*: If is 
•5- -4 , from >-he ?p%?aker'B mouth. 

T*".p nL^h)er of voice sampler taken durma LiCSP and VI AS work 
vm jj r ^ ♦^ighte^n maqic number sets worth of utterances - nix desia-* 

..,,^^4 ^ I :^.'T data, six desianated interim test data ^ ^nd six designated 
M's* iar-n. tr- rit ! v a woA i iea to limit the Speaker to three maqi- 

•**imt>er ^ s -i* -i si^tjnn t<.> avoid leqradatton in the voice samples due tx..' 
^jea)CHr fa^-i-jue or bore<ion>- For t b.e ste]^>- 1 hrouqh operation of the VDQS 
? Mittnes, is ^v>RsiM** ^o use fewer than six magic number sets apiece 

' X ^: '\ •■ t> ; , interim r.^'s* , '^:id ^e<it data: bu^^ we still recoOTnen<i tha** 
» * ^^f^^'i'f fu^yitee;'. nna^? • * •^.iimb^r s***^ worth data he used. 

y^Mj ' Trt : ; uT^i:**? ^♦^^ ' , if desire'**, .rr<*at.efi a f i 

» w' \ 'u :.^s f :r TTii*. ♦ vi^r '-ii ' ins ^>f ^\ \ ^^ompressed lata files ' ' 

i/i*-.: ' ' , ^"^i ♦ -.^s*' ^*s.- HTM I • vave^i * If list in'? f^les 
,,^;vM- >T:r» .V I > J : - MutrTeT se ^ , *• f)]f* FXT'U/T.Ui i»^t>ui'i b** 

• ►•i.djr.*' ^* ' • • Mrj.-^* 'r.d .1 ; • 'im^.'^-r set r ut^ , Ty-.is 1 1 st \ n^J f ilf^ is 

^-»...s AT if» vt.e: rwi*-- T d? • *t;mp7essei lat.^ ar*-^ <>iive ! \* ;s r- *..•»• 

• * * . • " vfr • « . 'i I ^ t r :♦.*;'.* : * * • * < ^ : i s i im j r «-> * 1 h ^ *. « > * : nai-* ^ m i 



pAp4ir* This tilm was saved and printed for one ma^ic nunber set at the 
beginning of VIAS# as part of a check that the TTI-'SOO preearvea tha aaaie 
features that imre identified in the VIP*»100» 

d» Utterances %*ich are misspoken should be deleted in the sense 
that the corresponding ^•CD and <**#R0 files should be deleted* Then the 
utterances should be respoken using the alternate mode of extraction 
permitted by EXTRACT when no prompting file is used. 

Kt the completion of voice data collection for a speaker # we recom- 
mend that any raw data (-«R0) files created by CXTRACT be moved to a 
separate disk or to tape* Itiey are not needed in the rest of the VDGS 
processing, and they take up a great deal of space on the disk* Also# any 
listing files created by EXTRACT should be handled in the same way* 

ESG 

Once voice data has been collected and compressed by EXTRACT, the 
next task of the VDGS is to create example spaces for use by GZEC and 
I/X>PER« Itie program ZSG is responsible for the creation of example 
spaces* Ihe separate program description for ESG explains basically how 
to operate it, but perhaps a few suggestions are in order* 

The example space name must begin with ^ES** , but the rest of the name 
is open to the user* We suggest that the example space names have the 
format 

ES$XyZ$nm 

where XV^ are the initials of the speaker and nm is the vocabulary item 
nimber (so# for example ^ Ulysses S* Grant's example space for item 4 would 
be ES$USG$04. ) 

^Iso/ the user mus': create a prompting file containing the names of 
all compressed data files to be used in building the example spaces* A 
version of this prompting file# called PFILE, is delivered with CHAINMINDj 
out this prompting file assumes that th^ trairing data came from magic 
number sets MNSETA through MNSETFf and consequently that the example 
spaces are to be built from compressed data files A, B, C, D# E -*C0* 
Should the user want a different prompting file^ the CLI command BUILD 
should be used* For example^ to build a file of names of compressed da a 
files ::or responding to magic number sets MNSETG and MNSETK^ type 

BUILD uromptinq file name G-»CD K-,CD» 

Thf*n r.he usf^r must, use the editor txy insert carnage returns between eacli 
entry an-l to eliminate carets* 

Des) le« t)\€^ prompt in '.I fll«, t h« routint^ ESG also requires W17,ST, a 
file of length and s' retcrh factors. This is a canonical file to be used 
for all speak^rst an«i xt is delivered in the iioftware package* 
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In lt« •ttp-^by^'St^p form ESG runs oncm for Mch voctbuliiry lt«m« Th« 
length of tia# for each run should bs about 20 »lnutss# 

Ones BSC is run snd ths output sxsminsd^ it nay bs ths cast that BSC 
has wkMxkmd momm nords as too long for furthsr procasslng* Ths ussr has 
t%K> options at this point t (1) continua and 1st ths progran (SEEC and 
LCX)PBR ignors ths iiords iihich ars too long# or (2) uss ths sxastpls spacs 
sditor BSDIT to nodify ths iford Isngths* For an sxplari^ition of ths 
opsration of BOXT, sss ths ssparats prograa dsscriptions^ If ths ussr 
choosss to bypsss BSIUT and 1st GZKC and LOOPKR Ignors sons sxasplss^ thsn 
th* data bass ussd to construct transition and loop Isttsr ssts %d.ll bs 
rsducsd to ths sxtsnt of ths niaaibsr of ignorsd i#orda. 

In lisu of using ESG to gsnerat^e sxample spaces automatically # one 
could also use ths prograA 0WI2 to facilitate hand-marking the training 
data and the program MEMD to create example spaces using the data produced 
by hand marking. For descriptions of these programs^ see the VDGS auxil- 
iary programs* 

G2EC 

Once example spaces have been created^ the VDGS is ready to generate 
transition and loop letter sets* Since the collection of transition 
letter sets is ths single most critical item in the VDGS data base^ it ts 
mandatory that each transition letter set be generated correctly* ThB 
program GZEC, embodying the critical algorithm GENRLIZ# generates the 
transition letter ssts* For an explanation of GZEC, see the separate 
program descriptions* GZSC runs ones for sach example space (and 
consequently for eleven vocabulary Items should be run eleven times) * 
Esch run of GZBC takes about 20 minutes* Since some of the queations the 
user will be asked by GZEC are not entirely self-explanatory # ve make some 
suqgesttons for responses below* 

a» The listing file should be prlrter and not disk/ to conserve disk 

space 

Do not change the value of SDCOEFF 

c. Do not \^^^ the coat-^wetght factors 

d. There is no existing machine to be generalized 

The operatvion of GZEC does not produce a single collection of transi- 
tion letter seta to be used. Rather, GZEC keeps a history of the transi- 
tion letter s^ts formed at each stage of its operation* it up to the 
user - using the routine RESCUE to pick out the best transition letter 
set for each vocabulary item. 

Since it inay happen that a partj^cularly bad exmple of an utterance 
occurs in the example space ^ or that a bad ''cutting** of a vocabulary item 
within an utterance has occurred, any collection of transition letter setst 
may be resurrected as long as the temporary files TRLS^^^TW and TR.IX***TM 
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•till •xlin* Thi» '•resurrection** Is done using the progrem RESCUE which 

aeke for the ••RESCUE INDEX*" The -RESCUE INDEX** corresponda to thet 

number on the coat graph produced by GZEC Indicating the deaired set of 
trenaitxon letter aeta# 

The correct RESCUE index for each vocabulary Item is determined by 
looking at the G2BC printout* Normally the last transition letter set 
formed by GZEC Is the right one* In this case^ look ac the cost graph at 
the end of the particular GZEC run, and detertnine the last *'machine 
number** corresponding to the column of utterance names on the left hand 
side* 

If it should happen that the last transition letter set formed by 
GZEC differs significantly f * the next-to-last or next-to-next-to-last ^ 
then an. earlier machine number should be chosen* In this case ••differ 
signif leant ly** means that the final transition letter set was formed by 
dropping three or more transition letters from the preceding transition 
letter set* In a rare case the last transition letter set will be •♦signi- 
ficantly different*** Still rarer, the last two transition letter sets 
will be significantly different* Once these machine numbers have been 
chosen by th«f user for each vocabulary item, we're ready to run RESCl?E, 
pluck out the transition letter sets corresponding to those machine 
numbers, and set up the transition letter set files to be used for the 
rest of VDGS processing* 

RESCUE 

To T\in RESrtjE, follow the instructions in the separate program 
description* In RESCUE, the term **re8cue Index** meams the same thing that 
••machine number- vioes in GZEC printout. RESCUE must be run once for each 
vr^cabulary Item. Its execution time in minimal* 

we recommend that the user not delete the files TRLS***TM and 
TR1X***TM when Qiven this option by RESCUE* Should the transition let- 
ter sets created by RESCUE be accidentally deleted or become inaccessible, 
they can be re-created if the temporary files TRLS^^.TM and TRIX***TM 
stlU exist* Otherwise, it would be necessary to run GZEC all over again. 

SIGH 

The pi'-^qram SIGH is run next. It checks 
one-by-one to see if their length exceeds 13. 
reduced r»y omitting the letters with most "T** 
ply passes ^^^n tr> th^ n »xt item. 

UX'^PEP 

loop letter sets ^*an be foxmd by the proqrain !/>OPFH. WoPEP muot he run 
^nc^ f^^r earh example spare -in-.i Sx) mufit he ran onre fx»r vocabul^^ry ;ten - 
eleven times for eleven itews. '^he r^in ^ime for a sinqle U')OPER run m 
this environment is about 30 minutes* A complete descrvption of the nper- 
ation of IO)PET? is included m the prov^ram lescrlpt :ons. 



the transition letter sets 

If BO, the length is 
featues* If not, SIC;H sim- 
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Ttkm tmml «tatlstical data coilactlon proe#s« btglaa with RIVIXK a£t#r 
th% tsrinaltioa and Ibop lattar aati hava baaa craatad« tha purpoaa of 
mvtxa la to CQllaot oountar data atatiatios# i*a*# atatlatios of tiaa of 
rasidanca of an incaelng lattar fron an uttaranoa in a tranaltlon or loop 
Xattar Mt* To thia ind# RIVEXIi muat ba run orvar all training data ^ that 
ia# ovar all aagic nuabar aata uaad to ganarata tha training data# Tha 
axplanation of how to run REVBXIk la iiicludad in tha program daacriptiona* 
RacoaoMuidad rafponaaa to aosaa of tha pro«pta ara glvan baXow; 

a* Tha aoda of data acquisition ahould ba 3« Htxm magic nuabar aata 
to ba uaad hara ahould ba tha onaa daaignatad for training data* 

b« All optional printing ahould be done* A great deal of informa-^ 
tlon about REVEXA and the recognition process in general is contained in 
these printouts* 

c* When# on tha second and succeeding runs of HEVEXAr the user is 
asked if the CDAT»RV and CIDX#RV files are to be deleted, the answer 
should be •no** In this case the program will continue to append to the 
old files ^ and this is^what is needed* 

d« Later on ws will explain why the user might choose to run ofte or 
t%ro initial machines* If the user is doing so# he must enter the vocabu^ 
lary item nxsaber for each Initial machine and also a stop time (in TTI^SOO 
time count units) for that initial machine* 

e« For REVEXA# the user should request that only the machines in the 
utterance should be used* 

A larg> file of counter data statistics is created by running REVEXA 
over the six magic ntnbar sets constituting training data* Fbr soae of 
the utterances prorsssed by REVEXA# the subroutine MIMIMINT (which mimics 
the operation of the MINT part of LISTEN) cannot come to a conclusion* In 
that case the user has two optlonst (1) ignore the misses and run RVDIT 
with no counter data record modifications, or (2) list the record numbers 
of all Items occurring in a MINIMINT failure^ and create a file RVCARDS in 
the format described in A*9*f with record ntuber entries for all the 
records to ^ '.^^^ged as **real«** If the misses are simply ignored^ the 
number of artifacts generated in subsequent routines vlll be somewhat 
larger; and the distinction^ bet%«en real recognitions and artifacts, 
blurs in proportion to the number of Items ignored. 

Itie run time of REVEXA is about thirty minutes per magic number set* 

RVUIT 

Ttie rvriqram RVDIT creates individual counter data records for each 
vocabulary item and/ if desired^ deletes from consideration all records 
specified in the file RVCARDS. the use of RVDIT is further explained in 
the proqreun descriptions* Run tune for RVDIT is about twenty minutes. 
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COVCRT 

Tht routine COVERT is run next* Its principal function is to create 
the covarlance Mtrix for each Mchine« The details of its operation and 

use ere explained further in the program descriptions* 

INVERT 

The routine INVERT is the next step* It inverts the covariance 
matrices created by COVERT* Its use is also explained later* In running 
this routine the user has the option of computing and printing the eigen«» 
values of the covariance matrices* The actual eigenvalues are not used 
later in the processing, so they may or may not be computed at the user's 
discretion* 

CROAK 

The last routine which operates using training data is CROAK* The 
program CROAK is an eclectic routine which performs all manner of statist 
tical computations and prints plots of 6 and ^ distributions* A discus* 
sion of the operator's interaction with CROAK appears in the program 
descriptions of this appendix* CROAK thust be r\m twice over each vocabu- 
lary item - once to generate statistics about real recognitions, and once 
to generate statistics ^Ubout artifacts* So, for the first CROAK run, the 
user should: 

a. Answer ••2** to the question about modes 

b. Save the probability statistics 

c. Set startinq machine number « 00 and end machine number equal to 
the last machine used (10, 11, or 12) 

Then CROAK runs about thirty minuteet* For the second CROAK run, the user 
should: 

a. Answer •'4** to the question aibout modes 

h* Enter startinq and end machine numbers as before 

Than CKOAX runs again for about thirty minutes* 

SOME CLEANUP 

At thi^^ p<:>Lnr the us'^r should do some di$jk cleanup. He can and 
should del#»te the follov-na files: CDAT-.RV, CIPXtRV, QDAT-.RR, QDAT-.AF, 
MUT)T-*RP, MUDT-.AF, RVX.ST, and RVCARDS^ 

REVEX 

Now "r^eqxn t-o use the mtPT im tf»ftt ia^-.^. The general description 

of the use ^^nd functions of the proaram REVTX is contained in the i-xqram 

descrip>:ion9 f but th'jre are a f-*w sugqer* ^: Ions wo «;hould make. 



Th« mo4# of data acquigttion ahould b% 3t Th# magic nwA>ar aata 
to hm uaad hara ahould ba tha onaa daaignatad for Intarim taat data* 

b. All optional printing ahould ba dona* 

Ct Whan^ on tha aacond and auccaading runa of REVBX, tha laaar is 
aakad if tha COAT*RV and CIOX.RV filaa ara to ba dalatad, tha anawar 
ahould ba ••no*. Than REVBX will continua and appand to tha old filaa* 

d* If tha uaar ia running ona or t%Po initial wachinaa, ha must, at 
tha appropriata prompt, antar tha vocabulary item numbar for aach initial 
machina and also a atop time (in TTI-500 counta) for aach initial machina* 

e. The user should request that all machines , not just the ones in 
tha utterance, ba run* This is important because this is the point at 
which data about artifacts is gathered. 

With REVEXr as with REVEXA, a large file of counter data statistics 
IS created by running the program over the six magic number sets of 
interim test data. Also, some of the utterances will not be recognized 
correctly by the MINIMINT subportion of REVEX. In these cases the user 
once again has the choice of ignoring the MINIMINT failures and proceed- 
ing, or of creating the file RVCARDS of records to be flagged as *'real." 
If this option is chosen, all record numbers corresponding to real recog- 
nitions should be entered in RVCARDS* 

RVDIT 

Run RVDIT just as before on REVEX output files CDAT.RV and CIDX.RV. 

CROAK 

^-n ^ROAK lust as before* 

ADDER 

The next VDGS routine to be run is ADDER. This program builds a 
table of transition and loop letter set violations Cor each utterance 
processed by RFVEX, complete de^^fr-v^ipt. ion of ADDER is given in th6 pro- 
qraro descriptions, and the only additional sug^^estion to b€^ made is that 
the output should not be directed to dl.5k since disk space is probably 
sparse at this mint. 

AVRAJ 



''^e pr'-^ar.\iT> AVRAJ i ?{ thp next 
.ind priM*-^ the averaq*- w.jti lencrth 



stf»p in the process. AVRAJ romputes 
for .^l'' vo<'3hul riry ifern^;. The routine 
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CRAP 

Thtt critical association paramsters are determinad by CRAP. CRAP 
•hottXd ]m run «• iiwtlcatc-l in thm prograa dascriptions. A» asu«l« th« 

listing file should be directed to the printer and not to the disk. 

GAPSTER 

The program GAPSTBR is primarily responsible for creating the gap 
matrix and the QASM matrix needed in the MIND file. The operation of 
GAPSTER is described in the sequel in the program descriptions, but a few 
comments about user inputs to GAPSTER are suggested below. 

a. The crtticil association parameter entered should be 1.0, 

b. The real standard deviation spread factor for the gap matrix also 
should be entered as 

c. The quartile and mean calculations are optional and not used in 
later processing. 

d. Disk file output should not be chosen. 

* 

SORTRA 

Run SORTRA to Sort the file GAPMAX. 
SORTRB 

Run SORTRB to sort the file CONGAP. 

« 

GAPSTER 

Delete 'the files QASM.DTr GAP.DT, and GAPMAX, and re-run GAPSTER as 
before. 

MUTE 

Run MUTE to compute the L-counter parameters MDLA** for each machinee 

GLOVE 

Run GLOVE to do the curve fitting for CROAK-generated 5-di«tributions 

TAILOR 

Run 'lAIUOR to compute thf* T-counter p^arameters MDTA** for each 
machine* 

BUI LDEK 

Run BUILDER to create the machine data file* 



■' 0 
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OfJOJH 

• 

tiM frofTMi OMm ftte NSiD f iU. ft»r tta« aoat ptrt* its 

oMration i» daicribad in tha program daacriptiona o£ thia appandix. 
Hmvar, thara a«a a oo^»la o£ ra<niirad oparator inputs that ara not 
plataly ialf-axplanatory. in tha first placa, tha -ravision nwb«r for 
this Job- Aoold ba antarad as "•0- for all spaakars. ^JJ^ 
thara ara ona or aora univarsal aachinaa baing run, tha program 0»L1R 
will ask for vocabulary idantiflcation numbar and snd ti«a for aach 
Mchina, and thasa Mast ba auppliad by tha usar. For axa»pla, if nachlna 
11 !■ a wivarsal «achina for vocabulary itam 2, an appropriata rasponsa 
Might b* •2,25- whan vocabulary its* idantiflcation and and ti«a ara askad 
for* 



PHEW 



Than tha procass of tha VDGS is co«plat«d by PHIW %»hich finishas tha 
building of tha MIND fil*. iha only rasponaa requirad of tha oparator 
hara is tha antry of tha total nwibar of vocabulary items used. 

h,5 THE AUXILIARY PROGRAMS 

The auxiliary programs delivered with the VDGS aret GASP, ESDIT, 
ESGDIT, MEND, and GWIZ. Here we deacribe the function of these auxiliary 
routinaa and indicate how they add to the flexibility of the VDGS. 

The program GASP has the simple function of printing the transition 
letter sets after the program RESCUE haa been run, or printing the merged 
transition and loop letter sets after the program LOOPER has been run. 
using GASP the user can see, and group together on hardcopy for future 
reference, the transition letter sets for each vocabulary item (with 
accompanying loop letter sets, if desired). 

•me programs KSDIT and ESGDIT are both concerned with the editing of 
example spaces. ESDIT allows the user to change individual start or stop 
times in the example space using either an ESG or a GZEC P'in^- 
reason that ESDIT is sometimos used is that the individual start/stop 
times in the example spaces are sometimes «ad - either f^^^l^;; 
for the wor-* .oo long, or the word has been "cut out" from the utter 
ance in a less than satisfactory way. This -bad cuttina- can arise from 
either an anomaly in the automatic example space generator ESG, or a human 
error if manual hand-marking is done using GWIZ and MEND. 

In any case, if the user wishes to modify the individual start/stop 
times in an example space, just what numbers are entered depends upon what 
program's output is being usee. If an ESG printout is being used, simply 
enter the new start and stop times when the program requests them. If a 
GZEC printout is used, the situation is a little more complxcated. If the 
old beginning 8top time Is 1^ and the user wishes to change this to 
Tv, enter Tv, - Tu + 1 as "new starting record". So, if the begin- 
ning record nvmber is correct as is, enter 1. To change the end time from 

to T^, enter - (Tfc + total number of records vn word) + 1 as 
"new ending record". 



Tha program BSGOIT is dasiqrned to operate on an existing example 
•pace file to produce a new example space file In tihlch all utterances 
beginning with the vocabulary Itsm specified are omitted* The operator of 
B86DXT Is explained fully in this program descriptions. Just why fSGDIT Is 
used is explained below where universal and Initial machines are 
discussed. 

The routines GWIZ and MEND are progreuns to be employed when manual 
"hand-cutting" of utterances into Individual words is to be used as a step 
In the creation of example spaces. The technique of hand"*cuttlng Is 
described later. The general procedure for the semi-manual creation of 
example spaces Is as follows: 

a. Create a file GWIZ. CD which holds tha names of all compressed 
data files corresponding to training data. This file should hold one file 
name per line, left justified. A relatively painless way of constructing 
this file is to use the BUILD command at the CLI level as was previously 
described for the program ESG. 

b. t^n GWIZ, following the instruction given in the separate program 
description. 

c. Hand cu'.: the GWTT; printouts, noting start and end times of each 
word within each utterance. 

d. Create a file MEND.WD holding all this data from hand-cutting 
(the format of this file is described in the description of MEND). 

e. Run MEND r,o create the example spaces. 

A. 6 HANO-KTirrTING THE DATA 

The process of hand--cutt ing data to separate words within an utter- 
ance is as much of an art as a science and is best learned by doing. 
However, there are some rules of thumb and general guidelines that the 
speech technician might wish to consider. 

a. The program GWIZ, Itself, indicates locations of v^ords with 
utterances on Its printout. These are quite helpful but generally are not 
refined enough to be used as more than guidelines. 

b. Whoever handcuts data must becotne extremely familiar with the 
hard copy presentation of an utt<»rance and tiie variety of patterns associ- 
ated with each particular item. It Is a good idea to start with utter- 
ances consisting of a single word and to compare tJhoae with utterances 
where that word is only a part. The important thing is to be able to dis- 
tinguish words visually and lo^-ate the Interword boundaries. Because of 
co-articulation effects, it in important t'^ allow overlap of word boun- 
daries? rarely, in the cont-ext of hand-cut ttnq, should the utterances be 
divided into non-over Lapp i nq aectments. The idea here is that the transi- 
tion letter set maker, G2EC, will discern nhf> important structure within 
the utterance, and that- one should not attempt to make too fine or toe 
subtle distinctions during the hand-mark mq proc^»ss* 
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e* Th« 9«<MMtrie«X configurations of th« vocabulary it«M ara a 
■i9nif leant aid in narking tha data* l!ha "shapas" of %R>rds should ba 
laamttd ««11 b«fer« audi hand-Marking ia dona* 

d* Sibilants and fr icati vaa ara a big halp in marking tha data. Tha 
"a**, "x*, and "th** sounds, whan idantifiad, maka tha damarking prooadura 
Ruch aaaiar* * 

a* Tha ralativa lattar counts indicatad as tha 6WIZ printout halp 
pick out longirti sounds, «.g.# *...«a* in "thraa", oo", in "tW, 

ate. 

f . raaturas 16-19 in GWIZ printout ara ganarally sat for fricativas, 
e.g., "a" in "six") features 15-18 are often set in the "x" of "six". 

g. Relatively long stops occur preceding "two", "point", and 
"three", when these items occur in the middle of ^n utterance. 

h. The vocabulary item "eight" is short and hard to pick out. For 
this < — 1, attention is best paid to relative time counts of two or three 
di ■ ».At basic sound groups. 

i. Fluid vowel sounds are sometimes very hard to distinguish, and 
often vary considerably from sample to sample. 

A. 7 ON UNIVSRSKL AND NON-INITIAL MACHINES 

For some speakers a given vocabulary item can apparently vary signi- 
ficantly, depending on whether It is the initial word in an utterance. If 
tha difference between initial and non-lnltial voicings of a word are 
significant enough, then a recognition process which does not distinguish 
initial from non- initial will not work very well. The VDGS has some 
facility for dealing with this problem, at least to a limited extent. 

When REVEXA is run, one has, for the first time, some indication of 
how well the transition letter sets are performing for the standard eleven 
vocabulary items. If the recognition of initial digits is noticeably bad 
for one or two items, the speech technician has the option of creating 
"universal" and "-^-n- initial" machines for these items, with separate 
transition and loop letter sets. Concretely, this means that the 
following steps must be carried out (to be specific here, we assume that 
the vocabulary items "two" and "three" for Ulysses S. Grant require both 
non-initial and universal machines): 

a. Renaiie ES$USG$02 as ES$USG11 

b. Rename ES$USGS0 3 as ES$USG1*! 

c. Rur ESCDTT, entering "2" as vocabulary item, example space name 
ES$USG$"i1, and new example apace name as ES$USG$02. 

d. Run ESGDIT, entering "l" *fc vocabulary item, example space name 
ESSUSG$12, and new example space name aa ESSUSG$0 3, 

"it 
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Rmmi* HC02.TI* AS HC1UTL 
f. Aan«ai« MCOJ.IL as HC12.TL 
9* Rsnsme KC02.i:.P as MCI LLP 
h. Banasia HC03.LP as NC12.LP 

1. Oalata TRLS02.1M« 1RLS03.aM« 1111X02. IM, and TRIX03.1M 
j« Run GZCC for tha naw axaaipla spaces BS$USG911 and ES$USG$12 
k* Rin LOOPCR for thasa naw axssipla spacas 
1* Rsrun RCVEXA. 
A. 8 PROGRAM DESCRIPTIONS FOR VDGS ROUTINES 

Program descriptions for VDGS routines follow : 
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1« EXTRACT 

TitUf IXTflJICT«8V 
Purpose I 

The purpoM of XXTRACT is fourfoldt 

a# Prompt thm speaker to voice en utterance/ 

b# Save Uie TTI-*500 features generated by thi^t utterance on a disk 
file» 

c. Comprese the features for LCSR processing. 

d# Provide hardcopy printouts of both raw feature datn and com- 
pressed datat 

Printout: 

The TTI-*500 detects 32 features every 2 msec* One of these features 
(LP4) signals a long pause: the speech sample is complete* The 
software backs-up 50 TTI-500 samples (a set of 32 features), and 
continues going back through the samples* When feature 26 (UVNLC), 
28 (n^ ^ n3} or 29 {EG<) EG2) is founds the search 
terminates^ This collection of features we call the ••raw-data 

The data extraction program, at the user's option, will print this 
raw data in the standard space/ asterisk format* The printout is con- 
sistent with TTl conventions: feature 1 (NAXqI) is at the left, 
feature 32 (LP4) on the right* 

\ 

A letter is the subset of features 17-31* Associat€ld with each 
letter is a counu of the number of times that letter occurred in the 
raw data, interrupted by not more than one occurrence of any other 
letters. (Such single count letters are always ignored.) This col- 
lection of letters and counts we call che ** compressed data*** 

The data extrar*-ion program reduces the raw data and prints the 
resulting compressed data. 

User Dialog: 

EXTRACT 

(Ensure that any subdirectories to be uaed are initialized. It is 
recominend<Ki that each user utilize a personal subdirectory 30 that 
multiple copies of data files for the same digit string can be kept. 
An identifier and instructions appear:) 

DATA EXTRACTION PROGRAM — ECLIPSE RDOS REV 6.23 

STRIKE CNTRL-A TO EXIT FROM PROGRAM 

l>5 



r 
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(Th* us«r is r«qu«sii«d to ttnter his name and an identifying cooment 
of up to 80 characters for this rtjn. This information, together with 
the date, is printed as a header on all printouts of the program. 



Hext, the user is queried to determine if he wishes to use a pre- 
defined prompting file. If so, he enters the file name* The program 
will verify that the name and file exist, but no other special checks 
are made. 

Following queries to determine if the user %rishes the printout of raw 
data and/or eoa^ressed data, the program prepares to accept speech 
data* If no prompting file is named, the user is conmandeds) 



(The TTI-SOO is activated and the program "listens" until the LP4 
feature is detected. The TTI-SOO is then de-activated and the user 
is requested to) 

ENTER COMMENT LINE: 

(Up to forty characters may ue entered, then) 

ENTER RAW DATA PILE NAME: 

ENTER COMPRESSED DATA FILE tAMEt 

(If a file name which is already used is entered, the user is told of 
the condition and requested tq redefine the file* A more serious 
error (e.g., directory not initialized) will cause an abnormal return 
to CLI. 

The following convention is recommended for use in naming the raw 
data *nd compressed data files: five characters, dot, "RD" or "CD*. 
Of the five characters, the first is the number set identifier (A-K) 
or "X" if no number set is used. The following four characters 
represent the number spoken, with N meaning "null," *P" meaning 
point. The .RD and .CD extensions refer to "Raw Data" and "Com- 
pressed Data." 

Once the files are named, the program proceeds to print the com- 
pressed data and raw data (if the user requested it) and to write the 
data onto the disk files. The program then goes back to listening 
signaled by the SPEAK I! command. Note that unless spooling is dis- 
abled, the line printer may still l>e active (very noisy I) when 
SPEAKil is offered. The user should be careful not to turn on the 
microirfione until after the printing is complete. 

If the user had nam«d a prompting file, the request to SPEAKll will 
be replaced by) 



SPEAK tl 
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SAY} numb«r 

<Wh«r« thm nmhmr It r«tri«v«d from th« prompting fH«, whan tha 
TTl-500 »h«ars* aom«thing, assum«d to bt th« promptad atring, tha' 
notification) 

<ia givan. Tha fila namaa ara autoraaticaXXy ratriavad from tha 
prompting fila (tha convantion notad abova ia uaad) and tha printout 
and ,diak writing ia parformad. if tha filaa alraady axiat, tha uaar 
. ia raquaatad to intarvana and nama f Has on-lina for tha data. Tha 
uaar ahould nota theae problama and raaolva tham following tha data 
extraction aaaaion. when the prompting file ia exhausted, the 
warning) 

NO MORE PROMPTS IN file name 

I 

(is given and the user is informed that the program will) 

GO BACK TO START 1 
Input Pi 111: 

MNSET-, the number set file 
Output Piles: 

«aCD, the compressed data files* 
•^♦RD, the raw data files a 

Error Messages: 



Only the standard RDOS file error messages are applicable to EXTRACT. 



\ 
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i. fSG 

TltUt 880. SV 

ESG builds an example space for a specified vocabulary Item from 
specified compressed data files* Inputs to ESG Include a list of the 
compressed data files %«hich contain this item, and the canonical set 
of len9th and stretch factors* ESG determines What portion of the 
utterance is most likely to contain that item» and writes the file 
name and starting and ending records to. the example space f lle« 

Printout t 

SSG provides a printout which describes the example space file 
entries. A printout of the automatically selected portion of the 
utterance is also provided under certain conditions when ESG deter- 
mines that hand marking of the data may be required. This occurs 
whenever the selected portion of the utterance Is too long for 
GBMRLIZ to accommodete, and when a doublet occurs. If modifications 
are required, the example space file can be edited using BSDIT. 

user Dialog: 

ESG 

ENTER THE EXI^PLE SPACE FILE NAME 
(THE FIRST 2 CHARACTERS MUST BE 'ESM: 

(This Is the designated file name of the example space file which is 
to be generated.) 

FILE ALREADY EXISTS. 

MAY I DELETE IT (Y/N)? 

(If the example space file already exists, the user can choose to 
continue by deleting the existing file or terminate. 
If he chose to terminate, the CRT displays) 

STOP- EXAMPLE SPACE PILE ALREADY EXISTS. 

(Otherwise the dialog continues) 

ENTER A BkIEF DESCRIPTION OF THE PILE: 

ENTER THE 2-OIGIT VOCABULARY ITEM # (00-10): 

ENTER THE PROMPTING Fll£ NAME: 

(The prompting file contaicis the file names of the compressed data 
file names used in generating the example space file.'. 
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INTER THE 3-LimOl SUBOXRECTORY NAMSt 

C0HPI,1TE0 PROCESSING VOCABULARY ITEM #i 
Q8XNQ PRONPTZMG PZLIt AND tUBOIRlCTORY 

DO YOU tfZSH TO CONTINUE PROCESSING ON THIS VOCABULARY ITEM (Y/N)? 

(If th« ufl«r «rlsh«* to continu* processing, th« program r«qa««t« 
another input of a prompting fiX« name and a tubdirectory name* The 
program continues building the example space file on the a*m vocabu- 
lary item using the newly specified proinpting file and subdirectory. 
Otherwise the program terminates.) 

STOP 

Input Piles: 

WIZ.ST, the file of length and stretch factors 

Prompting file (a user-supplied filename - e.g. PFILE) with the file 
neunes of the compressed data files to be used. 
Specified compressed data files 

Output Piles: 

Example space file for the specified vocabulary item. 
Error Messages: 

INVALID VOCABULARY ITEM # ENTRY 

(Another input is requested.) 

STOP-FILE WIZ.ST DOES NOT EXIST 

(The program terminates without this input file.) 

CKST~FILE DOES NOT EXIST: 

(If the prompting file does not exist, the program terminates. If a 
specified rntryreaBed data file does not exist, the program continues 
with the next specified compressed data file.) 

CKST UNKNOWN ERROR: FILE; 
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3t GZEC 

Title: GZECtSV 
PurpoMi 

G7EC finds the set of transition letter sets for a specified vocab- 
ulary item^ using 6ENRLIZ* 

Printout: 

GZEC provides three types of printout • The first describes the 
development of the set of transition letter sets* In the compressed 
data printout, the header shows the feature number • Below this# in 
the "PREQ- column, the letter itself, delimited by the symbol, is 
printed* The features set in a letter are shown by the symbol 
blanks indicate the feature was not set* In the transition letter 
set printout, the means the feature must be set, a blank means 
the feature must not be set, indicates indifference, and "N** 
shows where a modification to accommodate this utterance occurred* 
The mapping of the transition letter sets into the utterance is shown 
by printing the particular set next to the first letter in the utter-* 
ance which is contained in that set which occurs after a letter in 
the previous set. The ••NUMBER OP TRANSITION LETTER SET'' column shows 
the relation of the current sets to the initial or seed set of trans^ 
ition letter sets* 

The second printout shows the cost to modify the transition letter 
setSi^ the current mean cost and standard deviation for each example ' 
encountered* The plot of these values is useful for detecting bad 
examples* The rescue index is used to retrieve any particular set of 
transition letter sets for further use* 

The third printout shows the cost to modify particular sets of trans- 
ition letter sets (••MACHINE NUMBER" on the printout) to accommodate a 
particular example* A cost of 0*0 indicates that no modification was 
required* A marks the birth of a new machine, that is, it shows 
that previous machine was modified to accommodate the example* 

User Dialog; 

GZEC 

ENTER NAME OF EXAMPLE SPACE FILE: 

EhrPER HAME OP SUBDIRECrORY WHERE 
TEMPORARY FILES ARE TO RESIDE: 

ENTER TVW DIGIT VCXABULARY ITEM NUMBER: 

(If temporary files already exists the system queries) 
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PILES ALREADY KXIST* TR**— .IM 
MAY I DBLETB THBM? (Y OR 

(A "N- r««pon«« cause* system to STOP. Rename .TM file* before 
resuming processing*) 

ENTER LISTING PILE 
(P " > $LPT, D ■ > DISK) J 

(If the "D" option is selected, the listing file name is constructed 
from the example spaco file name with the .LS extension. If this 
listing file already exists, the system queries) 

MAY X DELETE filename? (Y OR N)j 

(A "N* response causes the system to STOP. Rename .LS file before 
resuming processing) 

PRESENT VALUE OP SDCOEF IS XX. X 
DO YOU WANT TO CHANGE SDCOEF? {Y OR N): 

(This is the value which controls modification of the transition 
letter sets. The modification is allowed if the cost is < the mean 
cost + SDCOEF standard deviations. If "Y" is entered, the system 
responds. ) 

ENTER SDCOEF: 

DO YOU WANT TO USE THE COST WEIGHT FACTORS? (Y OR N) 
(IF NOT, ALL WEIGHTS ARE 1.0. ENTER 'N' FOR HAND MARKED DATA): 

(Weighting factors are used to reduce the contribution of extraneous 
end data to the final set of transition letter sets. 

IS THERE AN EXISTING MACHINE 
WHICH IS TO BE GENERALIZED? (Y OR N): 

(This option allows an existing machine to be generalized to accommo- 
date new examples. A "Y" response causes the system to prompt:) 

ENTER FILE NAME (SUBDIRt NAME) : 

(If the "N** response was given to the former question, the system 
begins the search for a good starting point in the data. If the 
value of SDCOEF is sufficiently small, etc., the first pass through 
the example space may not yield a starting set of transition letter 
sets which satisfy the conditions. In this case the system notifies 
the user with) 
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PAUSE SO INITIAL MACHINE FOUND. SHALL I TRY AGAIN? 

(Strike any k«y to continue.) 

STOP, ALL DONEl 

Input Files: 

ES the example space file 
-•CD, the compreeeed data filea 

MC**.TL, optional set of transition letter sets which is to be 
generalized 

Output Files: 

$LPT or ES-.LS, the listing file 

TRLS**.TM, all intermediate sets of transition letter sets 
TR1X**.TM, index file into TRLS**.TM 

COSTP.TM, temporary file of costs, deleted after graph printouts 
FNFF.TM, temporary communication file between GZEC and PRNT7. 

Error Conditions: 

♦♦•WARNING: WORD TOO LONG^^* 
FILE: filename, NLETR: XX 

/(The system pro -ects itself against overfilling its arrays by 

1 verifying that the utterance to be processed is not too long. The 

'examples which are too long must be edited using ESDIT. Processing 

continues to the next file.) 

CKST — FILE DOES NOT EXIST: filename 

(If a file is given in the example space which cannot be found at 
processing time, the error is noted on the printer and processing 
continues . ) 

CKST — UNKNOWN ERROR: XX FILE: filename 

(This error indicates that although the file was found, it cannot be 
accessed for some reason which CKST is unable to remedy. Refer to 
the RDOS manual for a description of error codes and file status 
codes. Again this error is noted on the printer and processing 
continues . ) 

GWRD — UNKNOWN ERROR: XX FILE: filename 

(If GWRD is unable to read the compressed data file, it prints this 
error and takes the error return. The existence of the file is not 
in question when this error is detected, but rather some other file 
data error has occurred. The most likely cause is an illegal start- 
ing or ending record specified for the compressed dat^ file resulting 
from an error introduced in editing the example space.) 
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TitUt RESCUE. SV 
Purpose t 



. «ir.d aet of txansition letter sets from a temp- 

Since the final set of transition letter^^^^ Ta^Je"''.!! the 

not b. the best -t/^^.r^^rtie information to access them are 
unique transition letter sets and Tine 
saved in temporary files. 

4*.<«« i«.tter sets if requested. 
A^<^Ayt>& set of transition letter sw^o 
RESCUE prints the ^^^^""^t^ll^ J^.^s if requested. 
RESCUE deletes the temporary files 

Printout: 

User Dialog: 
RESCUE 

ENTER THE S-LETTER SUBDIRECTORY NAME: 

BNTER THE i-DIGIT VOC^UI^^V ITEM » (00-29): 

ENTER THE RESCUE INDEX: 

.v^«n letter sets as determined from 
A^^iTtkA set of transition letter »c 
{This is the desired set 
the cost graph produced by GZEC.) 

IS THE MACHINE TO BE PRATED (Y/N)? 

^«.i«r, i«.tter sets is printed 

tn addition to being written xn 
CREATED FILE 

(The machln. file name 1. displayed.) 

AND 

ARE THE FILES 

TO BE DELETED (Y/N)7 

jtrp to be deleted, 
.se. le ^erled -.et.e. ..e .e.po.a.. rUes a.e to 

T«LS".™ save, all unlcue B.t= of transition 
The tempcrary file . 
letter sets for vocabulary item 
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Th. t.«por«ry til. twx**.tm contain, th. .tmrtlng word numbers of 

TRLS**.Tia and the number of transition 
letter eets In each set for vocabulary item**. 

After the temporary files are deleted, if so requested, the following 
measage app«ers«) ^ 

DELETED FILES AND 

STOP PROCESSING COMPLETE 

Input Files: 

TRLS»*.TM, the temporary file of sets of transition letter sets 
for vocabulary item**. 

TRIX**.TM, the temporary file of starting record numbers and the 

number of transition letter sets for the sets of trans- 
ition letter sets stored in TRLS*»,TM for vocabulary 
item**. 

Output Piles: 

MC**.TL, the machine file of transition letter sets for vocabulary 
item**. 

Error Messages: 

CKST - FILE DOES NOT EXIST: 

(If any of the input files do not exist for the vocabulary item, this 
messagv. is output and the program termin-ates . ) 

CKST — UNKNOWN ERROR: PILE: 

FILE ALREADY EXISTS: 

(If the machine file already exists for the vocabulary item, the 
program terminates.) 

STOP ON ERROR 

(The program terminates for any of the above errors.) 
STOP - NO SUCH MACHINE 

(The specified machine number does not exist. The program 
teintiinates. ) 
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5* SIGH 



Tltl«t SIGK.S7 
Purpose: 

SIGH checks the transition letter set files MC***TL created by RESCUE 
to determine if the number of transition letters in each MC**.TL file 
is less than thirteen. If the number is less than thirteen, nothing 
is done; if this number is greater than thirteen, the transition let-* 
ters with the greatest number of "?*• feattires are deleted until the 
remaining number of transition letters is smaller than thirteen. 

Printout : 



None 



User Dialog: 



SIGH 



ENTER 2-DIGIT STARTING MACHINE NUMBER 



ENTER 2-DIGIT END MACHINE NUMBER 



(SIGH checks each transition letter set in order* If a transition 
letter set need not be reduced, the message) 

TRANSITION LETTER SET FOR THIS ITEM OK 

(appears on the CRT. If a transition letter set is reduced, the 
message) 

TRANSITION LETTER SET FOR THIS ITEM REDUCED 

(appears. ) 
Input Files: 

MC**.TL/ transition letter set for item** 
Output Files: 

MC**eTL, reduced or unmodified transition letter set for item** 
MC**.XY, non-reduced transition letter set for item** 

Error Messages: 

INVALID ENTRY - illegal machine number entered 

FILE OPEN ERROR - could not open MC**.TL file. 
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Tltlti lOOPER^SV 
PurpoMi 

LOOPER finds the loop letter sets for a particular vocabulary item* 
It provides a printout which shows the sets of possible loop letter 
sets for each example « 

Printout: 

In LOOPER each example from an example space is printed and next to 
it the sets of transition and loop letter set8# The loop letter set 
printout is identical in format to the transition letter set format/ 
except what the %fords ••EMPTY SET** appear to describe this condition 
(impossible In transition letter sets)* The letter sets are 
identified in the far right- hand column* ''Tl'* means transition 
letter set 1# means loop letter set 2, and so on* In some 

cases # empty loop letter sets eire not shown because the transition 
letter set printout takes precedence* 

If the example has more than one start point, this printout is re-* 
peated for the subsequent cases* 

The final set of loop letter sets which accommodates at least one 
start point in each utterance in the example space is also printed* 

, User Dialog: 

LOOPER 

ENTER NAME OF EXAMPLE SPACE FILE: 
ENTER 2-OIGlT VOCABULARY ITEM (00-10): 
ENTER NAME OP SUBDIRECTORY 

WHERE SET OP LOOP LETTER SETS IS TO RESIDE 
(MUST BE 3 CHARACTERS): 

ENTER DESCRIPTION OF THIS RUN: 

(The description entered here is printed in the header of the LOOPER 
listing.) 

STOP LOOPER IS FINISHED 
Input Files: 

ES-, the example space file 

MC**#TL, the set of transition letter sets for this vocabulary item 
••CD, the compressed data files specified by the example space. 
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Output Fll«sx 

MC**.LP, th« Mt of loop l«tter sttts for thi« vocabulary Item 
LP**.TM, a tamporary file of sats of loop lattar sata.for tha dif far- 
ant starting points in tha axamplas. This fila is daletad aftar 
procaasing is complata. 

Error Conditional 



LOOP LETTER SETS ALREADY EXIST FILE: filanama 

(This fatal arror results when tha specified loop letter sets already 
exist. Delete or rename the specified file before restarting 
LOOPER.) ^ 

STOP LOOPER MUST HAVE A SET OP TRANSITION LETTER SETS 

(Loop letter sets cannot be generated without the corresponding 
transition letter sets (MC**.TL).) 

Other file data error conditions are identical to the CKST and GWRD 
errors described in the GZEC discussion. 
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7. RBVEXA 

Title: REVBXA.SV 
Purpose t 

REVBXA is version A of the revised research machine exerciser* Es- 
sential ly, it is a stripped-dovm version of REVEX whose purpose is to 
collect counter data* In contrast to REVEX# however^ REVEXA does not 
allow any utterance with a transition or loop letter set violation to 
proceed to recognition • Por a niore complete description of the oper-^ 
ation of REVEXA, see the program description for REVEX# 

Printout t 

TThis is the saire as in REVEX. 

User Dialog; 

This is the same as In REVEX. Here, to speed execution, the 
question: 

00 YOU WANT TO USE ONLY THE MACHINES IN THE UTTERANCE? (Y/N) 

should be answered "Y**, since the other machines would only contri- 
bute artifacts, and, at this point, djBita about artifacts are not 
used* 

Input Pi lest 

MNSET* - the magic number sets to be used by REVEXA 
-•CD - compressed data files 
MC**rTL - transition letter set for item ♦* 
MC**.LP - loop letter set for item 



ERLC 
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8. RVDIT 

Titl«» BVOIT.SV 

% 

» •/ * 

Purpotttt 

RVDIT is thm counter data fila aditor. RVDIT craataa txom tha 
countar data, file for a apeakar <<9UB>: CDAT.RV) countar data fllaa 
for aach of tha machine nanbera {<SUB>! CDAT**.RV where *♦ ia the 
machine number)* 

The uaer can apecify which counter data records are to be flagged as 
"real* in the new files to be created by inputting the fi'.e <SUB>j 
RVCAROS. 

Printout: None 

User Dialog t 

RVDIT 

ENTER THE 3-LETTER SUBDIRECTORY NAME: 
IS THERE A MACHItlE 12 (Y/N)? 

ARE THERE ANY COUNTER RECORDS TO BE MODIFIED (Y/N)? 

(If tho user does not wish to keep all of the counter data records, 
then file RVCARDS mvtst exist*) 

WARNING — PILES subdirectory :CDAT**.RV WILL BE DELETED 
FOR ALL MACHINE NUMBERS** (00-11). 
DO YOU WISH TO CONTINUE (Y/N)? 

(If specified, the program terminates with STOP PROCESSING. 

otherwise, the specified files Are deleted and the rro<rfam 
continues. ) 

STOP ALL DONE 

Input Files: 

CDAT.RV the counter data file 
V.CIDX.PV the counter index file 
RVCA1U)S contains the record numbers of the counter data records to 
be flagged as "real" in thu new files to be created. 

Output Files: 

CDAT**.RV the counter: data files for machine number ♦*. 
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fei--. 'trior MasUmt . '''***->^ 

CKST— FILE DOES NOT EXIST J 



(If any of th« input fil«» do not eklst, this message ia output and 
the progran tecmlnates*) 

CXSTo-ONXNOira ERROR} 
STOP ON ERROR 

(The program terminates for any of the above errors.) 
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9. COVERT 

TttlA: COVIRT.SV 
PiarpoMt 

COVBRff conput«s th« eovarianc* «atrix, median, delta lower, delta 
upper of the counter* for each specified machine. It also calculates 
the coefficients of correlation for the non-diagonal i5)per triangular 
elements of the covariance matrix. 

Printout! 

For each specified machine, COVERT prints for each counter position 
the selected counter equation, the ordered C-valuea, median, delta 
lower, and delta x5>per. It prints the calculated covariance matrix 
and the coefficients of correlation for the covariance matrix. 

User Dialog: 
COVERT 

ENTER THE 3-LETTER SUBDIRECTORY NAMEj 

ENTER THE 2-DI*iIT STARTING MACHINE NUMBER (00-15 )» 

ENTER THE 2-DIGIT END MACHINE NUMBER (00-15): 

(COVERT creates the covariance matrix files for the machines in 
ascending order, beginning with the starting machine number and 
finishing with the end machine number. 

The current limits on the machine numbers are 0-15). 
CREATED FILE: 

(The covariance matrix file name for the machine is displayed). 
STOP-ALL DONE FOLKS I 
Input Files: 

CDA^**.RV, the counter data file for machine 
Output Files: 

CM**, the covariance matrix file for machine **. 

COV.ST, the counter data statistics file. A record of counter data 
statistics (medians, delta uppers, delta lowers, equation flags) is 
written for each machine processed by COVERT. 
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error ite«««9««t 

INVJILID ENTRY 



(Invalid startln? or «nd machin* numbars were «nt«r«d« Another input 
ia raqu«st«d.) 

CKST~PILE DOES MOT EXIST: 

(If tha input fila doas not axiat for tha machina baing procasaad, 
thia owaaaga ia output and tha proeaaaing ia aXippad for thia 
machina « } 



CKST — UNKNOWN ERROR: FILE: 
PILE ALREM>Y EXISTS: 

(If tha covarianca matrix fila already axiata for tha machine baing 
procaaaad, thia maaaaga is output and tha proeaaaing ia akippad for 
thia machine) • 
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10. INVERT 

Titl«J INVERT. SV 
Purposat 

INVERT calculates tha invartad covariance matrix for aach apaciflad 
machina. it alao computes the aiganvaluea of the covariance matrix 
for each apecified machine if so desired. 

Printout} ^ 

INVERT prints the eigenvalues of the covariance matrix when the 
option to compute the eigenvalues is chosen. 

User Dialog: 
INVERT 

ENTER THE 2-DIGIT STARTING MACHINE NUMBER (00-29): 

ENTER THE 2-.DIGIT ENDING MACHINE NUMBER (00-29): 

PRINT EIGENVALUES OP COVARIANCE MATRIX (1»YES, 0-NO): 

(INVERT creates the inverted covariance matrix files for the machines 
in ascending order, beginning with the starting machine number and 
finishing with the end machine nvanber.) 

The current limits on the machine numbers are 0-29. 
Input Piles: 

CM**, the covdriance matrix -file for machine **. 
Output Files: 

INCM**, the inverted matrix file for machine **. 
Error Messages: 

INVALID ENTRY 

(Invalid starting or end machine numbers were entered. Another input 
is requested.) 

FILE OPEN ERROR 

(The covariance matrix file cannot be opened.) 
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Titl^t CROAK^SV 
PurpoMt 

CROAK calculates the delta and mu values for each counter data record 
of a machine ntirober and then ordera and prints the delta and mu val** 
uea* CROAK also creates the file RVX#ST of statistics used by REVEX* 

Printout: 

Pot each vocabulary item, CROAK prints out the inverted covariance 
matrix for that item, its determinant, and the delta and mu values in 
unordered and in sorted form with their computed mean and standard 
deviation* ^CROAK also plots the cumulative distributions of the 
delta and mu values* 

User Dialogj f 
CROAK 

ENTER THE 3-LETTER SUBDIRECTORY NAME: 

(Th«n the program requests the user to enter the data extraction 
mode: } 

ENTER THE DATA STATISTICS EXTRACTION MODE 

1 (Mode 1 - REAL RECOGNITIONS WITH VIOLATIONS) 

2 (Mode 2 ~ ALL REAL RECOGNITIONS) 

3 (Mode 3 - REAL RECOGNITIONS WITHOUT VIOLATIONS) 

4 (Mode 4 - ARTIFACTS ONLY) 

(If the user enters anything but 1, 2, 3, or 4* the message "INVALID 
ENTRY" appears, and the viser is asked again for a mode number.) 

(If any mode but 4 is chosen, the user is asked if he wishes to save 
the CROAX^generated statistir-s: ) 

DO YOU WISH TO SAVE THE PROBABILITY STATISTICS? ( Y/N ) 

(Then the progrw reqfuests starting and ending machine numbers; ) 

"ENTER 2-DIGIT STARTING MACHINE # (00-15)" 

"ENTER 2 -DIGIT END MACHINE » (00-15)" 

(If an Illegal entry is made, the message "INVALID ENTRY" appears and 
the user Is asked again for starting and end machine numbers.) 
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Whttn th« delta values have been computed and sorted, the message 

DELTA VALUES ARK STORED IN PILE ^appears, and »ihen the mu values 

have been oosiputed and sorted, the message 

MU VALUES ARE STORED IN FILE appears. 

The message 

STOP- ALL DONE FOLKS! 

appears when the processing is complete. 
Input Files J 

INCM** - inverted covariance matrix file for item ** 

CDAT**.RV - counter data file for item ♦* ' 

CIDX.RV « index file for the counter data file 

GOV. ST - counter data statistics file 
Output Files: 

QDAT**.RR - file of delta values for reals 

QDAT**.AF - file of delta values for artifacts 

MUDT**.RR - file of mu values for reals 

MUDT**.AF - file of mu values for artifacts 

RVX.ST - statistics file for REVBX 
Error Messages: 

DELTA FILE FOR THIS ITEM ALREADY EXISTS: 

MU FILE ALREADY EXISTS FOR THIS ITEM 

(Either QDAT** or MUDT** already exists and should be deleted or 
renamed before invoking CROAK agaih*} 

PROBUEM CREATING 

CROAK was not able'^to create the neuned file.) 
CKST — FILE DOES NOT EXIST 
CKST UNKNOWN ERROR 
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12. R8VBX 

Tltlti RBVBX*SV 
Purpose : 

RBVEX is the revised research machine exerciser. It was designed to 
serve one of many functions depending upon the particular subroutines 
loaded with it. The version implemented for this phase of the proj* 
ect is the counter data extraction Version. It can operate using any 
number of machines. For each utterance, it finds all machines which 
go to recognition and saves the counter data collected, that is the 
number of letters which occurred in each transition and loop letter 
set for every machine which went to recognition. In this version, 
both loop and transition letter violations are allowable. 

Printout: 

t 

IlEVEX provides three types of printout. The first provides a de- 
tailed history of the progress of each copy of each active machine. 
It is read as follows. The machine number is shown in the heading. 
Machine 0 is that constructed for the word zero. Machine 10 is that 
which recognizes ••point'* and is shown as ••?•• in printout 3. When two 
separate machines exist for a single vocabulary item, their histories 
are combined under the one colximn for that machine. The initial or 
universal machine start is marked by a **Z**, while the u n-initial 
version starts are marked with the typical ••S". 

The stage of each machine is shown for each letter in the utterance 
(the letter number appears on the left and can be correlated with the 
utterance printout given in printout 3). The stage is described by a 
single nunber or letter, as shown in Table h2. (If the numerical 
stage exceeds 9, only the units digit is printed.) Thus the progress 
of a copy of a machine can be traced by simply following the speci- 
fied print column. Note that when the number of copies of a machine 
exceed the space available, data for copies in the next print column 
is shifted to the right to allow data for all copies to be printed. 

Special symbols are used in addition to the stage descriptors. Their 
meaning is shown in Table A3. 

Finally, a line appears at the end indicating which machine copies in 
the final stage were forced to recognition at the end of the 
utterance. 

Printout 2 lists relevant data about the recognitions, including the 
loop letter violations. The "start order** values are used to correl- 
ate these data with printout 3. The universal machine descriptors 
are user selectable. Her *'2U** distinauishes that machine from the 
non* initial version ('•a**). 

Printout 3 gives the utterance and maps the recognitions onto it in 
order of start time* The characters refer to the machine, as dis- 
cussed for printout 1* The symbol marks the time the machine 
spent in its last stage. 
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TABLE h2. Stage and Stat* O«»criptor« Ut«d in REVBX Type 1 Printout 



Stag* 



1 

2 



10 
1 



State 



Transition 

1 

2 



Loop 

A 
B 



J 

K 



TABLE A3. Meaning of Special Symbols in REVEX Type 1 printout 



Symbol 
Category 

Machine start 



Parent copy 



Symbol 



S 



$ 



/ 



Meaning 

The violation-free start of a particular 
copy of a machine. Not used for universal 
machines. Always stage 1* 

The violation-free start of a particular 
copy of the universal version of a machine. 
Always stage 1. 

Machine copy start on a transition letter 
violation. Can occur in stage 1 only for 
the first letter of the utterance. There- 
after, the stage is one greater than 
parent's stage. 

Marks the parent copy of the ••$'• to the 
left of this copy. Indicates parent is in 
T state. 

Parent in L state. 

Parent in the L state with an acceptable 
violation this letter. 

Parent dropped due to excessive loop 
violations. 
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TABLE A3» Moaning of Special Symbols in REVEX Type 1 printout (Cont) 

Meaning 



Symbol 
Category 

Violations 



Dropped copies 



Final Stage 
Recognition 



Symbol 
$ 



X 

/ 



\ 



Transition letter violation within accept- 
able limits such that a new copy (the off- 
spring) was started* 

L state, letter not In L (l#e., an L 
violation) • 

Copy dropped due to excessive h violations* 

Copy dropped due to excessive L violations 
after having sired an offspring* 

Copy not selected for advancement to the 
next stage because a) a better copy was 
advanced; or b) a copy in the next stage 
was better than this copy* 

Copy dropped because a copy advancing to or 
created in this stage is a better copy* 

Copy dropped after recognition* 

Copy In final stage delay, awaiting 
recognition* 

Recognition 
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UMr Oialogt 
RSVEX 

EMTBR l«MS or SOBDIRICTORY 
WKERS DATA PILES RESIDE 
(MUST BE 3 CHARACTERS): 

ENTER MODE OP DATA ACQUISITION 
(MODE 1 - SSG PORMAT PILE 
MODE 2 - LIVE UTTERANCE 
MODE 3 - MAGIC NUMBER SET 
MODE 4 - INDIVIDUAL -.CD PILES): 

(The liv« uttarance option will be implemented in a subsequent phase. 
If mode 1 is selected, the system responds.) 

I 

ENTER NAME OP EXAMPLE SPACE: 

(If mode 3 is selected, the system requests,) 

ENTER NAME OF MAGIC NUMBER SET: 

(The system searches for and reads the machines, and explains the 
brief pause to the user) 

READING INDIVIDUAL MACHINES . . . 

(User dialog continues after machines are found) 

EXTRA PRINTOUT TO $LPT? (Y OR N): 

(This option allows MINIMINT printout to be directed to a disk file 
RV.LS if -N- is entered. If this listing file already exists, the 
system asks) 

LISTING PILE EXISTS 
MAY I DELETE IT? (Y OR N): 

(The -N" rissponae causes the system to open the existing listing file 
for appending.) 

DO YOU WANT THE LONG STATE PRINTOUT? (Y OR N): 

(A "N* response causes the printout described as type 1 to be 
suppressed. ) 

DO YOU WANT THE MACHINE OVERLAP PRINTOUT? (Y OR N): 

(A "N" response causes the type 3 printout to be suppressed.) 
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PILia BXIST, PILESi SUBxCIDX.SV SDBtCOAT.BV 

MAY I DELETE THEM? (Y OR N) : 

(This message is output if counter data files already exist on the 
specified subdirectory. A "N" response causes the system to append 
the new data to the existing files# and the message is output,)' 

APPENDING TO EXISTING PILES 

(When a machine is found for a vocabulary item above 10, the system 
asks , ) 

MACHINE ** FOUND. IT IS ASSUMED TO BE AN INITIAL MACHINE. ^ 
ENTER VOCABULARY ITEM TO WHICH IT CORRESPONDS (0-10): 

(The system then requests a descriptor for use in type 2 and 3 
printouts, ) 

ENTER 2 CHARACTER DESCRIPTOR FOR MACHINE 
(E.G. •2U'): 

(Finally, the starting time for the non- Initial version is 
requested, ) 

ENTER TIME ♦* SHOULD STOP AND XX BEGIN: 

(The system offers the option of activating only those machines which 
are actually in the utterance. To use all machines, enter "N") 

DO YOU WANT TO USE ONLY THE MACHINES IN THE UTTERANCE? (Y OR N): 

(A pause for system initialization follows, and then the system 
requests , ) 

ENTER DESCRIPTION OF THIS RUN: 

(The user may enter a 40 letter descriptive string which is printed 
in the header . ) 

ENTER NAME OF UTTERANCE FILE 
(OR TO TERMINATE): 

(This request Is made for every utterance If mode 4 was selected 
.above. Otherwise no further user inputs are required.) 

STOP REVEX IS FINISHED 
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Input Pll«si 

orintirtit* transition letter sets for the vocabulary items 

MC**«tP, the sets of loop letter sets 
-.CD, the cooipressed data files. 

KS- or MNSST*, example space or number set files if either form of 

entry was selected. 

GOV. ST, counter data statistics 

IMCM**, inverted covariance matrix files 

RVX.ST, statistics file created by CROAK 

Output Piles: 

CIDX.RV, the index file into CDAT.RV in which the data for each 
utterance are kept. 

CDAT.RV, the set of counter data, start and end times and loop viola- 
tions for each machine which goes to recognition. 

Error Messages: 

ILLEGAL MODE 
ENTER MODE 

(This message is printed when the mode is not in the range 1 < mode < 
4. ) ^ _ 

♦♦♦♦WARNING: -.RV PILES ARE NOT COMPATIBLE 

CURRENT BYTES: XX, OLD BYTES: XX 

STOP 

(This message ocvurs when an attempt is made to append to counter 
data files created under a different-revision of REVEX. The differ- 
ing record sixes make it impossible to append. The old counter files 
must be deleted or renamed.) 

Filename DOES NOT EXIST 

(This message appears when a particular machine is not found. REVEX 
assumes this machine is not to be used and continues, revex can 
operate with 1 to 13 machines in its present configuration.) 

NO SPACE IS AVAILABLE TO INSERT A MACHINE COPY FOR 
VOCABULARY ITEM #: XX 

(This message is output to the printer when no space is available in 
the machine copy data array. The size of this array must be changed 
to accommodate extra copies if this error is encotmtered. ) 

CKST and GWRD file data errors are the same as those described for 
GZEC, 
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13. ADOW 

TitU: ADDER, SV 
Purpo8«: 

ADDER lists the transition and loop letter set violations by vocabu- 
lary item for each utterance which has been processed by REVEX. This 
violation data is stored in two tables: one for real recognitions 
and one for artifacts. 

Printout: 

If 'deaitred, ADDER prints violation data for each utterance processed 
as well as the two violation tables. 

User Dialog: 

ADDER 

(The program then responds: ) 
PROGRAM ADDER 

DO YOU WANT TO PLACE OUTPUT ON A DISK FILE? (Y/N) 

(If the answer is yes, the program responds:) 
ENTER DESIRED FILENAME (16 CHAR MAX) 

(If a bad filename is chosen, the program prompts: ) 
SOMETHING IS WRONG WITH YOUR CHOICE OF FILENAME. 
CHOOSE ANOTHER. 
Then* 

ENTER RELEVANT COMMENT (40 CHAR MAX) 

ENTER DISK CONTAINING CDAT, CIDX DATA FILES (3 CHAR) 
(e.g. , enter DP2) 

ENTER SUBDIRECTORY LOCATION OF CDAT, CIDX DATA FILES (3 CHAR) 
DO YOU WANT THE LONG FORM PRINTOUT? (Y/N) 

(ADDER then proceeds to process utterances one by one, storing 
violation data in the two violation matrices.) 
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Input 711«st 

CMT.mr counter data flU fro« RIVSX 
CXOX.RV lnd«x flu for tho countor data fila 
Output tllaat 

LTVf.hM fila containing violation tabla uaar by DEALER 
Error Maaaagaat 

STOP PROBLEM WITH LTVF.MM 

(Thar«< was a problerti opening the file LTVF.MM) 
KARMA — PILE DOES NOT EXIST 
(Either CDAT.RV or CIDX,RV does not exist) 
KARMA — UNKNOWN ERROR 

(Problem vdth status of CDAT.RV or CIDX.FV files) 



NAVTRAIQUIPCBN 78-K:-0 141-1 
» 14. AVMU 

AVRAJ compute* average word length for real recognition*. 
Printout! I 

AVRAJ prints out the average *iord length for each ma9hine. 
User Dialog: - 

AVItAJ 

(The program responds: ) 
PROGRAM AVRAJ 

ENTER DISK CONTAINING CDAT, CIDX DATA PILES (3 CHAR) 

ENTER StIBOIRBCTORY LOCATION OF CDAT, CIDX DATA PILES (3 CHAR) 

(The progran then proceeds to run through the CDAT.HV file, 
calculating average word length for each vocabulary itsm over all 
real recognitions of that item. When this process is complete, the 
message below appears.) 

AVERAGE WORD LENGTHS ALSO EXIST ON BINARY FILE: AVRWRD.ST 

STOP AVRAJ IS FINISHED. 
Input Files: 

COAT.RV Counter data file created by RRVEX 

CIDX.RV Index file to CDAT.RV 
Output Files: 

AVRWRD. ST 

File of average *»rd lengths 
Error Messages: 

0 

KARMA — FILE DOES NOT EXIST 

(Either CDAT.RV or CIDX.RV file does not exist on the specified 
subdirectory. ) 



:RJC 



120 



KMMA UNXNOHN XRROR 

(»robl«m with statue of COAT»RV or ClOX.RV, fll««,) 

STOP W90Mim OPtMZMG AVRHR0.8T 

<Th«r« a probX«a opanirg AVRWRD.ST.) 
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» 

If, CRAF 

Purposes 

CRAP determines the critical v»rd association factors for each vocab- 
uiary item, going through each utterance processed by RBVBX to check 
associations occurring in the utterance at nine levels of overlap. 

Printout: 

For each of the nine levels of overlap, CRAP prints out a matrix of 
overlap parameters. Also, the critical association parameters for 
each vocabulary item are printed. 

User Dialog: 
CRAP 

(And the program responds:) 
PROGRAM CRAP 

DO YOU WANT TO PLACE OUTPUT ON A DISK FILE? (Y/N) 

(If the user answers affirmatively, the program requests a filename. 
The disk file, if chosen, receives all output that *)Ould otherwise go 
to the printers.) 

(Then the program asks for the location of the CDAT.RV and CIDX.RV 
files: ) 

ENTER DISK CONTAINING CDAT, CIDX DATA PILES (3 CHAR): 
(e.g., enter "OPa**) 

ENTER SUBDIRECTORY LOCATION OF CDAT, CIDX DATA FILES (3 CHAR) 

(The program then asks for the machine types of machines 11 and 12:) 

ENTER VOCABULARY TYPE FOR MACHINE 11: (I.E. »2' or U*) 

ENTER VOCABULARY TYPE FOR MACHINE 12: (I.E. '2' or 'A') 

(An entry of - 1 should be made if the machine is not being used.) 

(Then, at the end:) 

CRITICAL ASSOCIATION PARAMETERS HAVE BEEN OUTPUT TO CAP. ST 
STOP CRAP FINISHED 
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Input riittit 

AVRWS0.8T - film of «v«r«9tt word l«ngth« 

CDAT.FV • counter data file 

CIDX.KV • ind«x file for th« counter data file 

Output filee: 

COMGAP - file for contiguous real gap matrix 
CAP, ST • file of critical aeeociation parameter* 

Error Messages: 

STOP PROBLEM OPENING OUTPUT PILE - for disk file output 
STOP PROBLEM OPENING CONGAP 

STOP PROBLEM OPENING AVERAGE WORD PILE "AVRWRD.ST" 

I 

SOMETHING IS WRONG WITH YOUR CHOICE OP FILENAME. CHOOSE ANOTHER 

If a disk file output is chosen and the filename given is not 
acceptable, another name is asked for. 
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16. (SAPSTBR 

Titl«j GAPSTER.SV 
Purpos«: 

GAPSTER is responsible for determining the association, gap and delay 
values; GAPSTER creates the gap matrix GAP.DT and the QASM matrix 
needed in the MIND file. 

Printout: 

GAPSTER prints out time gap statistics for real and artifact recogni* 
tions as well as the gap matrix and the QASM matrix. 

User Dialog: 

GAPSTER 

PROGRAM GAPSTER 

DO YOU WANT TO PLACE OUTPUT ON A DISK FILE? (Y/N) 

(If the user answers affirmatively, the program requests a filename. 
If the filename chosen is bad, the program asks for another name.) 

(Then, )ust as in CRAP, the program asks for the location of the 
CDAT.RV and CIDX.RV data files. 

If these data files are found, the machine types of machines 11 and 
12 are requested.) 

ENTER VOCABULARY ITEM FOR MACHINE 1 1 

ENTER VOCABULARY ITEM FOR MACHINE 12 

(If a machine is not being used, enter "-I" 

Vtim program then asks for the critical association factor: ) 

ENTER CRITICAL ASSOCIATION FACTOR 

(the recommended response here is "1.0") 

(At this point, the user is asked to enter the total number of 
m-\chines to be used.) 

ENTER TOTAL NUMBER OF MACHINES TO BE USED 

(For example, if machines 0-10 were being used, the response would be 
"1 1") 

(After running a few minutes, the program halts and requests:) 
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ENTER REAL STO.DEV* SPREAD FACTOR FOR GAP MATRIX 
iTh« recommended reply here is "1.0") 
(A little bit later the user is askedt ) 

DO VOU WANT THE QUARTILE AND MEDIAN CALUCLATIONS? (Y/N) 
(These calculations are not required and may be omitted.) 
(Then, at the endt ) 
STOP GAPSTER IS FINISHED 
Input Piles: 

CONGAP - file of contiguous real gap matrix 
Output Files: 

GAPMAX - file holding maximum gap value 

GAP.DT - file holding gap matrix 

QASM.DT « file holding QASM matrix 
Error Messages: 

STOP PROBLEM OPENING OUTPUT FILE - for disk listing file 

STOP PROBLEM WITH OPENING CONGAP 

STOP PROBLEM OPENING GAPMAX 

STOP PROBLEM OPENING CAP. ST 

FILE DOES NOT EXIST - the comprised data file in question does not 
exist 

STOP trxwiiLEM OPENING QASTMP - cannot open QASM.DT 



MUVTRAEQUIPCEN 78-C-0141-1 ^ 
17/1 a/ SORTRA, SORTRB 
Titl«» SORTRA.SV, SORTRB.SV 
Pur poa« t 

SORTRA/SORTRB sorts the file GAPMAX/CONGAP, putting the entries in 
ascending numerical order* 

Printout: 

SORTRA prints a plot of the ordered values of the file 
GAPMAX, and SORTRB does the same thing for the file CONGAP. 

User Dialog: 

SORTRA (SORTRB) 

(SORTRA/SORTRB proceeds to sort the entries in the file GAPMAX/CONGAP 
and then plot the sorted values. 

Input Files: 

GAPMAX " for SORTRA 

CONGAP - for SORTRB 
Error Messages: 

None 



126 

1 0 



19. wra 

71 tit t MUTt«SV 

MOW eanpttt«s thm L-count«r p«r«Bttt«rs NOHAO^ MIHAI, and M0IA2* 
Printouts 

MOTl prints out thm MDlAO, MMAI, and MntA2 valuta for aach machlna, 
*• mil aa tha paraaatara for raala and artifacta and for 
raala and artifacta. 

Ua«r Dialog t 
MOTE 

CMTER 2HDIGIT STARTING MACHINE NUMBER 
ENTER 2 -DIGIT END MACHINE NUMBER 

(MOTE procaada to compute the MDLA** values for each machine begin- 
ning with the starting nxjmbar machine.) 

Input Fileat 

MODT**.RR - fila of values for reals of item ** 
MUDT**.AP - file of values for artifacta of item ** 

Output Files » 

UX>py - fii« of MDLA** values for all machines 
QLSTATS - file of and values for all machines 

Error Messages t 

TOO MANY MU VALUES - the number of values exceeds 600 
FILE OPEN ERROR - one of the MUDT** files cannot be opened 



{ 



NAVTRW^JOIPCIH 78-C-0141-1 

20, GLOVE 

Tltl«t GWVK.SV 
PttrpoMt 

GLOVE is a least squares routine designed to fit a curve through the 
observed points of the cumulative distribution of the Q*p quality 
function values for both real and artifact recognitions. 

Printout* 

GLOVE prints out five coefficients for each real vocabulary item and 
five coefficients for each artifact vocabulary item. These coeffici- 
ents are the "adjustable parameters" determined so that, with these 
values used as coefficients in the general functional form, the fit 
to a particular sec of data points is best in the sense of least 
squares* 

User Dialog: 

GLOVE 

ENTER STARTING MACHINE NUMBER (0-29) 
ENTER END MACHINE NOKBER (0~29) 

(GLOVE then proceeds to compute coefficients for each vocabulary 
item, real and artifact, doing reals first in ascending order then 
artifacts in ascending order*) 



file of real delta values for item ♦* 
file of artifact delta valued for item ** 



file of coefficients, median delta and range for reals 

file of coefficients and median delta for artifacts 

file holding ntimber of real deltas for each vocabulary 
item 

file holding number of artifacts deltas for each vocabu- 
lary item 



Input Files: 

QDAT**.RR - 
QDAT**.AF - 

Output Files: 

QDrT**.RR - 
QOFT**.AF - 
APR 

APA 
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Error Itos«a9«tt 

PILB STATUS ERROR FOR ITEM* 

■ 

TOO NAHY bSLTA VALUES FOR THIS ITEM 

(Thtt nuab«r of deltas excMda available array size*) 




NAVTRAEQUIPCEN 78-C-O 141-1 
21. TAILOR 

Tttl«» TAILOR. SV 
PurpOMs 

TAIIOR calculates T-counter quality function values. TAILOR reads 
co«£fici«nt» and a range of values to fit from file« QDAT**.RR and 
QDAT**.AP (where is the machine type), then fits a quadratic to 
the ratio of the real and artifact data. The coefficients of the 
fitted curve are then transformed into a MEX usable form and witten 
into the file WHAT. 

Printout: 

The real coefficients from QDAT**.RR. 
The range at delta values to be *used. 
The coefficients for artifacts. 

The range of delta values actually used in the fit. 
The determinant of the matrix used in the least squares fit. 
The coefficients of the fitted curve. 
A plot of the fitted curve and data points. 
The MDTA* values. 
User Dialog: 

TAILOR [ beginning machine nmnber/ B] l e'nding machine ntanber/ El 

If /B option is omitted, beginning machine number is assumed to be 0. 

If /B option is omitted, ending machine number is assumed to be 10. 

TAILOR will type the matrix generated by the least squares fit for 
each machine type processed. 

Input Files: 

QDAT**.RR, coefficients for real recognition 

QDAT**.AF, coefficients for artifacts, where ♦♦ goes from 00 to 10 
Output Files: 

WHAT T-counter quality function values 
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Srror M9«««g«it 

STOP - OPIN 8RR0R - QDAT**»- 
STOP - READ ERROR - QDAT**,- 
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22* 8UZL0IR 

Tltl«t BUILOBR.SV 
Purpose » 

BUIILOER builds the fflachiwe data fll« MDPL.MM from th« input files 
LOOPY and WHAT created by MUTE and TMLOR respectively. 

Printout t 

None 
User Dialogs 

BUILDER 

(BUILDER then proceeds to read the files LOOPY and WHAT and then 
merge them to create MDPL»MM« When this is complete, the message 

MOFL.MN CREATED 

appears at the CRT.) 
Input Piles t 

LOOPY, file of MDLA* values created by MUTE 

WHAT, file of MDTA* values created by TAILOR 
Output Piles { 

MOFL«MM, machine copy data file needed by DEALER 
Error messages: 

PROBLEM OPENING LOOPY - cannot open LOOPY 

PROBLEM OPENING WHAT - cannot open WHAT 

LOOPY AND WHAT INCOMPATIBLE - the ntaber of entries in LOOPY is not 
the same as the number of entries in WHAT. This terminates the 
program. 
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23* DEALER 

TltUt OD^LBR.SV 

mSJi!^'^^^' tog«th.r th^ variou. created by CRAP, GAPSTER, and 

BUILDER to cr«at« th« fil« MIMD.VD, 

•Tin tout: 

Nona 
User Dialog: 

DEALER 

PROGRAM DEALER 

ENTER REVISION NUMBER PGR THIS JOB: 
(Th« current ravitlon number is •♦0") 

(The program then ask« for number of /ocabulary items, and the disk 
and subdirectory containing the data.) 

ENTER NUMBER OP VOCABULARY ITEMS (0-13) 

ENTER "DISK: SUBDIR": (8 CHAR. MAX) 

(Por example, the user might enter '•DP2:ABC**) 

If there are universal machines (macK«n-:i, 11 and the program 
requests *^ " 

EWTER VOCABULARY ID AND END TIME POR MACHINE SEPARATED BY COMMA 

(So, for example, the user might enter '•2,25'' for machine 11, indi- 
cating that, m^-hine 11 is universal machine for vocabulary item 2, 
and that the end time for this item is 25.) 

Finally, if no errors occur, the user is asked for comment to add to 
the data file. 

Then 

L0OI05 LIKF. WE MADE IT, FOLKS 

appears, and the MIND fiU is complete, except for the play factors. 
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1^- Input niMt 

LVrr«MM - transition/ loop Utter set violation table file 

MC**,TL - transition letter set files 

MC»*,LP ~ loop letter set files 

COV.ST - covarlance statistics file 

IHCM** - inverted oovariance matrix files 

RVX.ST - statistics file created by REVEX 

NOF.MM - machine data file created by BUILDER 

GAP.OT - gap matrix file 

CAP. ST - critical association parameter file 
Output Pile: 

MIND.VD « incomplete NINO file 
*£rror ^tessagest 

ERROR NO. OCCURRED IN STAT CALL FOR PILE 

(The Status of the named file is bad. The RDOS error code is used.) 
STOP — TOO MANY STAGES 

(The number of transition letter seta is too large.) 
STOP PROBLEM WITH COV.ST 
STOP PROBLEM WITH RVX.ST 
STOP PROBLEM WITH A INCM** FILE 
STOP PROBLEM WITH MDFIL - MDFL.MM file is bad 
STOP PROBLEM WITH LTVP 
STOP PROBLEM WITH Q^SM.OT 
STOP PROBLIM WITH GAP.DT 

(Usually an error of this kind simply means that the named file does 
not exist on the subdirectory in question.) 



•'MIND'* IS WARPED - status of output file is bad, enter another 
GIVE ME A NEW FILENAME 
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24. PHEW 

Titlat PHm.SV 
PurpoMt 

PHIW wcifM th« thr«« play factors to th« mnd of thm MIND fll«. 
Printout: 

WIIW print* out th« a priori costs for sach nachlns 
User Dialogs 
PHEW 

ENTER HIMBSR OP VOCABULARY ITEMS 

(PHEW opsns th« MlND.VD filo for appending, computes a priori costs 
for each machine and writes these costs to the end of the MIND file 
together with the three gap matrix play factors. When this is 
accomplished the message 

MIND HAS BEEN CREATED 

appears on the CRT.) 

Input Files: 

MlND.VD, the data file created by DEALER 

APR, the file holding number of real deltas for each machine 
APA, the file holding number of artifact deltas for each machine 

Output Piles: 

MIND.VO, the complete MIND file 

Error Messages: 

PROBLEM OPENING MIMD.VD - the fils created by DEALER cannot be 
opened* 

VOCABULARY ITEM NUMBER MISMATCH - the number of vocabulary items 
input does not match the number used in DEALER. 
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ti- . • - 

ii ' • • 

25. BSDIT (Auxiliary) 

Titlttt BSDIT. SV 

PurpoMt / 

ESDIT la the example space file editor. It provides the capability 
to change individual start/atop values In the example space using an 
ESQ or GENRLI2 printout. For a GENRLIZ printout, the starting record 
number and the end record number of the /compressed data entered by 
the user must be offset. 

Printout: 

ESDIT produces a printout of the nfiit showing the element In the 
example space which was chan^jed, and the old and new values associ- 
ated with* It. 

User Dialog: 

ESDIT 

ENTER NAME OP EXAMPLE SPACE FILE: 

ARB VOU EDITING FROM AN BSG PRINTOUT? (Y OR N): 

(If the user answers no, then) 

ARB YOU EDITING PRCM A GENRLIZ PRINTOUT? (Y OR N): 
EWTER RECORD NUMBER: 

FILE: , STARTING RECORD: , ENDING RECORD: 

(This Is a statement of the current Information In the specified 
record number.) 

ENTER MEW STARTING RECORD: 
ENTER MEW ENDING RECORD: 

ARE THERE OTHER CHANGES TO THIS EXAMPLE SPACE FILE? (Y OR N): 

(If there are further changes. Inputs of record number, new starting 
record, and new ending record are requested*) 

DO YOU WAWr TO PROCESS ANOTHER EXAMPLE SPACE? (Y OR N): 

(If another example space Is to be processed, the user di^alog Is 
repeated.) 

STOP 
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Xnpat Pllttst 

ts-, th« •ximpl* apac« flX«. 
Output Fil«st 

On output, the «x«mpl« space file is updated. 
Krror Messages: 

STOP PILE STATUS ERROR 

(The example space file status must be perfect for the program to 
continue* ) 





* ■ * 



« 
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26. ESGDIT (Auxiliary) 

TitUt ESGOIT.SV 
9arpo««t 

This stand alon« program operatas on an existing «xampl« tpac« fil* 
to produce a new example space file in which all utterances beginning 
with the vocabulary item specified are omitted. 

Printout: 

K hardcopy listing of utterances used and those omitted is produced. 
User Dialog: 
ESGDIT 

ENTER NAME OP EXAMPLE SPACE: 

ENTER VOCABULARY ITEM (0...P): 

ENTER NEW EXAMPLE SPACE NAME: 

OLD FILE DESCRIPTION: file description 
ENTER NEW FILE DESCRIPTION: 

STOP ESGDIT IS FINISHED 
Input Files: 

ES the example space file 
Output Files: 

ES the new example space files 

Error Messages: 

FILE ALREADY EXISTS, FILE: file name 
ENTER NEW EXAMPLE SPACE NAME« 
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27* GASP (JVuxilUry) 

TitUi OJiSV.SV 
Purpoaat 

^A8P (th« Or««t American Sp««ch Printout roatina) was or««t«d durin? 
tha cloaing aomanta of tha LCSR projact phaaa 0 for tha final raport. 
ror aach apaclflad machina ni«bar, Qk89 printa tha transition lattar 
aata or tha margad tranaltlon and loop lattar sata* 

Printout J 

GASP printa out althar transition lattar sats or margad transition 
and loop lattar sats for aach spacifiad machina nxanbar. 

Usar Dialog: 
GASP 

ENTER THE 3-LETTBR SUBDIRECTORY NAME* 

ARE m TRANSITION AND LOOP LETTER SETS TO BE MERGED IN THE PRINTOUT 
(Y/N)? 

(If ths lattar aata ara not margad, only the transition lattar sats 
ara printed for tha machina nurtbar.) 

ENTER THE STARTING MACHINE NUMBER: 

ENTER THE END MACHINE NUMBERt 

(GASP prints tha machinaa in aacanding order, beginning with the 
starting machine number and finishing with the and machine number. 
The current limita on the machine numbers are 0-15.) 

STOP-ALL DONE 
Input Files: 

MC**.TL, Transition letter set file for the machine number 
MC**.LP, Loop letter set file for the machine number 
Output riles: 
N/A 

Error Messaqes: 

INVALID ENTRY 

(Invalid starting or end machine numbers were entered. Another input 
is requested.) 
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27, GA8F (AttKllitry) 

TltUt GASP*SV 

GASP (the Gr««t American Speech Printout routine) was created during 
the closing momente of the LC8R project phase 0 for the final report. 
For each specified machine nunber« QiiSP prints the transition letter 
sets or the merged transition and loop letter sets* 

Printout t 

GASP prints out either transition letter sets or merged transition 
and loop letter sets for each specified machine number* 

User Dialog: 
GASP 

ENTER THE 3-LETTER SUBDIRECTORY NAME: 

ARE THE TRANSITION AND LOOP LETTER SETS TO BE MERGED IN THE PRINTOUT 
(Y/N)? 

(If the letter sets are not merged, only the transition letter sets 
are printed for the machine number.) 

ENTER THE STARTING MACHINE NUMBER: 

ENTER THE END MACHINE NUMBER: 

(GASP prints the machines in ascending order, beginning %fith the 
starting machine number and finishing with the end machine number* 
The current limits on the machine numbers are 0-15.) 

STOP-ALL DONE 

Input Files: 

MC**.TL, Transition letter set file for the machine number ♦* 
MC**.LP, Loop letter set file for the machine number ** 
Output Files: 
N/A 

Error Messages: 

INVALID ENTRY 

(Invalid starting or end machine numbers were entered* Another input 
is requested.) 
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CXdT— 'PILI OOn NOT tXISTi 

(If any of th« input fil«t do not •xist for nachint nunb«r **, this 
■oa««9« is output and no aaehina printout it «ado«) 

CXST— OMIQIOini tRRORi FZLBt 
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TitX«t GMrZZ.SV 
PurpoMt 

\ 

OIIZ is «n auxiliary invastigativa program which d«lin«at«« the worda 
within an uttaranca using tha givan ffZZARO sta^stios. It prints tha 
eoMprassad data fila blatantly noting tha dalinWiona* lha oo«» 
prasaad data filas to ba usad ara listad in 80B<^ZZ«CD* 

Printout I \ 

\ 

<MIZ prints tha coMpraasad data filaa noting tha dal^aations, 
Usar Dialog t 

GHIZ 

ENTIR IHB a-LETTER SUBDIRECTORY MMtEt 
STOP 
Input Pi last 

^IZ*CD« fila containing tha coaiprassed data fila nanas. 

W12.ST, fila of langth and stratch factors which rasidas on tha main 

diractory* 

~*CD» tha spacifiad comprassad data filaa* 
Output Filaa t 

Nona 
Error Massages i 

CXST— FILE DOES NOT EXIST 

CXST UNXNOWM ERRORt FILSt 

(If a fila error is detected on file wiz.ST, then GWIZ terminates 
with the message 

STOP - FILE WIZ.ST DOSS NOT EXIST 

GWIZ also terminates on an error from file SUBt aNlz*CD« GMIZ out- 
puts the error message and continuea processing on an error from a 
compressed data file*) 
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29. MSMO ( Auxiliary ) 

Titl«t MBNO.SV 
PurpoMt 

MSNO 1« an auxiliary prograa which craatas axampla apaca for all tha 

vocabulary itasa from tha handnarkad input data obtainad frcm tha 
QUIZ printout* Thaaa ara tha axampla apacaa to b« input to GZEC. 

Programa OWIZ and MEND bridga tha naad to xecuta WIZARD and ES6 
givan a WIZARD atatiatica fila. 

Printout: 

Nona 
Uaar Dialog: 

MEND 

ENTER THE 3-LETTER SUBDIRECTORY NAME: 

WARNING— FILES ES$<SUB>$** WILL BE DELETED FOR ALL MACHINE 
NUMBERS** (00-11). 

DO YOU WISH TO CONTINUE (Y/^l)? 

(If apacifiad, tha program tarminataa with STOP PROCESSING. Other- 
wiaa, tha spacif iad f ilaa ara dalatad and tha program continuaa) . 

STOP ALL DONE 
Input Pilaa: 

MEND. WD, fila of handmarkad input data to craata tha axampla apacas. 

Tha handmarkad input data fila ia organized as follows: Two lines 
ara aaaociatad with data coming from aach -.CD fila. Tha firat lina 
(beginning in colunm 1) contains tha number of worda in tha utterance 
and the -.CD filename. (For example, the utterance "1234" might pro- 
duce: '4, LHN:A1234.CD where LHN ia tha subdirectory which holda the 
-.CD files and the A1234.CiD is the relevant condensed data file.) The 
second line entry has the format: machine number, beginning record, 
end record separated by commas for each word in the utterance. 

Output Piles: 

ES$SUB$**, example spaces for machine**, ** » 0,11 
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Brror Messagest 

CXST - PILE DOES NOT EXIST: 

CKST — UNKNOWN ERROR* PILE: 

(If th«re is an arror from the input fil« MEND. WD, th« program 
outputs the ncssagtt and tarminates with) 

STOP ON ERROR 
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A. 9 PILE 0ISCRX9TX0M OF VDQS USER-CREATED FILES 

Pil« dascrlption o"* VDGS us«r->cr««t«d fllas are pr«s«nted on the following 
pages. 
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Filtt 



MMSBT^ vher« * is a 1 -character number set identifier. 



Description: The card images for the nxvnber sets include the number 
spoken and the file name of the compressed data file as 
shotm below» The digits plus the word ••point"* comprisa the 
base vocabulary # 

Note that the list includes 2 to 4 word numbers and that 
each list has the following properties! 

1» Every digit occurs 15 times i the vK>rd ••point occurs 14 
times* 

2. Every digit occurs first 4 times and last 4 times; so 

does the word ••point^'* 
3« Every transition between two digits (e*c|« 67, 68# 99 

etc*) and between a digit and the word ••point and 

between the word ••polnt^^ and a digit # occurs exactly 

once in each set* 

For data collection purposes/ each set is augmented with the 
eleven base vocabulary words; hence each set consists of 55 
numbers ( including the single word ••point** ) • 



Created By: 
Format: 



N/A 



Randomly organized, 256 words/record 



Column g^ 
1 - 4 

7-12 



Contents 
Number, right- justified 
Blanks 

File name, ending with right- justified 
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C«r4 lMi9« fiU for th« count«t data fiU aditor 



oaaerlptlon* «ia etrd iMtgaa fov tha eotintar data flla aditor contain tha 
oountar data raoord nvMd>«ra to ba kapt In tha nmr oountar 
data fiXa to ba oraatad. 



Craatad Byt 

Poraat: 

Mota: 



Randoaly organisad, 256 words/racord 

Tha oountar data raoord nurabars to ba kapt ara wrlttan fron 
right to laft in aacanding order* 



Colunma 



1 - 2 



3-4 

5-8 



9 
11 
17 

23 
29 
35 
41 
47 
43 
59 



10 
14 
20 

26 
32 
38 
44 

50 
56 
62 



Contanta 

Tha manbar of antriaa on thia card, 
right-juatifad {no mora than 10 antriaa. 
Otharwiaa« it la blank and \0 antriaa 
ara on tha card*) 

Tha laat card oon taint -1 and tha 
reauhindar of tha card ia blank* 

Blanks f 

Covmtar data raoord numbar to ba kapt, 
right-juatifiad, in aacanding ordar 

Blank* 

Blanka or countar data racord numbers to 
be kept, right-juatifiad for aach entry, 
in aacanding order 
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t 

Fronpting f iU Cor 186 or GMIZ 
VFXLE for t8Q# OlfXZ.CD for OWXZ 



D*«ciriptloni This fll« holds th« o«rd laui9M of tho n«M« of «11 tho 

, coMprossttd data filf coMprlalng th« total Mt of training 
data. 



Craatad B^: 
Fomatt 



Oaar 



Randomly or^anlxad, 256 words/ racord 



Columna 



1 - 8 



Contanta 

conprassad data fiXa naat 
(including -»CD axtanaion) 
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riltt of hand-cttttin? result* 
NlMD.lfD 



O««orlption» Yhis uMr>'«r««t«d hol>^« «11 dutii 9X««n«4 fron ttm hand- 
cutting . proc«dur« Including th« nunb«r of %«orda in •«ch 
utt«ranc« ^d th« start and «nd timaa for aach word* 

Craatad Byt Uaar 

Format t Randomly organisad, 256 wordi/racord 



Columns 
Una 1 - 1-2 

3-14 
lina 2 - 1-22 



Contents 

number of words in utterance, followed by a 
comma ^ 

compressed data file name, including three 
letter subdirectory name, colon, data file 
name plus -.CD extension 

% 

machine nxaraber, beginning time, end time 
separated by commas for each word in the 
utterance 



\ typical entry for the compressed data file A1234.CD might then look like 
this* 

4,A1234.CD 

1, 1,25,2,20,50,3,45,80,4,70,100 

••in H-ja win mah 
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RiacUB indma identifying nunb«rs of best tr-ansition letter 
sets 



Filet 



REDEEM 



Oeecriptiont 



For the CKAINMIND version of VDGS, RBDCIM is a us<9r created 
file holding the RESCUE indexes of each transition letter set 
to be chosen by RESCUE, one per line, in order, so that the 
first nuna>er corresponds to machine 0, the second to iwchine 
1, etc. 



Created By: User 

Format: Randomly organized, 256 words/record 



Coliwns Contents 



1 - 2 RESCUE index for this item. 
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Oltta location fll« 



D««crlptiont For th« CIAZNMZNO v«rsion of VDGS, th« film WiSRC holds th« 

location of th« counter data filaa craatad by RXVBX* This 

location is ,to ba spacifiad in tha font disk unittsubdirao- 
tory nana* 

Craatad By: Usar 

Format! Randomly organised^ 256 words/ record 



Columns 



1 - 7 



Contents 



disk unit : subdirectory name 



For example, a typiftal entry in WHERE might be DP2:USG 
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A. 10 DATA FILES AND COMMAND FILES FOR VDGS PROCESSING 

The following pages contain tables of important data files used during 
VDGS processing, compile and load macros for all VDGS programs, and command 
files for the execution of CHAINMIND. 
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TABLI A4, SINE QtA HON PILI8 OF VtXSS PROCISSZMG 



EXTRACT 

ESG 

GZBC 



Input £il«t 

I " ' 

MMSXT* 

PFIUE 

-.CD 

WXZ.ST 

-.CD 



Output Fil»« 
-•CD# -.RD 



TRIX**.TM 
TRLS**.TM 



RESCUE 



TRIX**.TM 
TRLS**.TM 
REDEEM 



MC**.TL 



SIGH 



MC**.TL 



MC**.TL 
MC**.XY 



LOOPER 



ES$***$** 

-.CD 
MC**.TL 



MC**.LP 



REVEXA 



MNSET* 
-.CD 
MC**.TL 
MC**.LP 



CDAT.RV 
CIDX.RV 



RVDIT 



CDAT.RV 
CIDX.RV 
(RVCARDS) 



CDAT**.RV 



COVERT 



CDAT**.RV 



CM** 

COV.ST 



INVERT 
CROAK 



CM** 

INCM** 
CDAT**.RV 
CIDX.RV 
COV.ST 



INCM** 

QDAT**.RR 
QDAT**.AP 
MUDT**.RR 
MUDT**.AF 
RVX.ST 



REVEX 



INCM** 

COV. ST 

RVX.ST 

MNSET* 

MC**.TL 

MC**.LP 



CDAT.RV 
CIDX.RV 



ERIC 
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tUBtf A4. 8X111 QTm MOM fXLIS OF VDQ8 fR0CB88XM6 (OOflt) 



Routines 



ADDER 



AVRJU 



CRAP 



GAPSTER 



SORTRA/ 
SORTRB 

MUTE 



GLOVE 



Input film* 

CDAT.FV 
CIDX.RV 

CDAT. RV 
CIDX.RV 

AVniRD.ST 

CDAT.RV 

CIDX.RV 

CONGA P 
CAP. ST 



GAPMAX/ 
COMGAP 

MUDT**.RR 
MUDT** .AP 

QDAT**.RR 
QDAT** .AP 



Output Fll«» 
LTVP.MM 

AVWRD.ST 



CONGAP 
CAP. ST 



GAPMAX 

GAP. or 

QASM.OT 

GAPMAX/ 
CONGAP 

LOOPY 
QLSTATS 

QOPT** . RR 

QOFT**.AF 

APA 

APR 



TAILOR 



BUILDER 



DEALER 



PHEW 



QDFT**.RR 
QDFT**.AF 

LOOPY 
WHAT 

CAP. ST 
WHERE 
LVTF.MM 
MC**.TL 
MC**.LP 
GOV. ST 
INCH** 
RVX.ST 
MDFL.MM 
QASM.DT 
GAP.DT 

MIND.VD 

APA 

APR 



WHAT 



MDFL.MM 



MIND.VD 



MIND.VD 



TABLE M. SIMB QCA MOH FILB8 OF VDG8 PR0CBS8IM6 (Cont) 



(Auxiliary 
Routines) 

GASP 



ESDZT 

ESGOIT 

GWIZ 

MEND 



Input fil«« 



Output PllttS 



MC**.TL 
MC**,LP 

ES$***$— 

GWIZ. CD 
-.CD 

MEMO. WD 



non« 

ES$***$** 
ES$***$** 

none 

ES$***$** 
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9MLI AS. C0H»XL1 WIO LOAD NMIM FOR VDCS iOOTMlS 



itotttiif Oompil» macro 



EXTRACT 


BXTCPaXM 


tSG 


BSQCPaXM 


GZIC 


GENCP.XM 


RE8CUI 


RBSCP.XM 


SIGH 


SXGHCP.XM 


UX>PIR 


LPCP.XM 


RXVtXA 


HVXACP.XM 


RVDIT 


RVDCPaXN 


COVIRT 


COVCP.XM 


IMVIRT 


IHVCP.XM 


CROAK. 


CROCP.XM 


RSVBX 


RVXCP.XM 


AODIR 


ADDERCP.XM 


AVRAJ 


AVRAJCP.XM 


CRAP 


CRAPCP.XM 


GAPSTER 


GAPSTKRCP.XM 


SORTRA 


SORTRACP.XM 


SORTRB 


SORTRBCP.XM 


NOTE 


MUTBCP.IN 


GLOVE 


GLOVECP.XH 


TAILOR 


TAILORCP.XM 


BUILDER 


BDILOSRCP.XM 


DEALER 


DE».LIRCP.XH 


PHEW 


PHSWCP.XM 



Load macro 

< 

SSGLD.XM 

GBMLD.XM 

RB8LD.XM 

SIGHLO.XM 

LPLD*IM 

RVALO.XM 

RVDLD.XM 

COVLD.XM 

IMVLO.XM 

CROLD.XM 

RVLDaXN 

ADDIRLD.XM 

AVRAJLD.XM 

CRAPLO.XM 

GAPSTERLD.XM 

SORTRALD.XM 

SORTRBLD.XM 

MUTBLO.XM 

GLOVBLD.XM 

TAILORLD.XM 

BUXLDBRLD.XM 

DBALSRLD.XM 

PKBWID*XN 
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TABLI A6. CON»ILt AMD LOAD MACROS FOR SPBCIAL CHAXIWXMD ROOTZMBS 



Rotttiattt 


CoMpil^ Mioro 




ZBS6 


ZBSGCP.XM 


ZI80L0.XM 


zone 


ZGIMCP.XM 


ZOntLO.XM 


2RB8CUI 


ZRB8C9.XM 


ZRISLD.XM 


ZSZQI 


ZSZORCP.XM 


ZSZOHLD.XM 


ZLOOFIR 


ZL9CP*XM 


ZLPLD*XM 


ZRXVIXA 


ZRVXACP.XM 


ZRmiZiD«XM 


ZRVDZT 


ZRVDC9.XM 


ZRVDLO.XM 


ZCOVIRT 


ZCOfVCP.XM 


ZCOVLD.XM 


ZXMVIRT 


ZINVCP.XM 


ZIMVLO.XM 


ZCRQAK 


ZCROCP.XM 


ZCROIiD.XM 


ZRSVSX 


ZRVXCP.XM 


ZRVLD.XM 


ZADDER 


ZAODERCP.XM 


ZADDBRID.XM 


ZAVRAJ 


ZAVRAJCP.XM 


ZAVRAJLD.XH 


ZCRAP 


ZCRAFCP.XM 


ZCRAPLO.XM 


ZGAPSTSR 


ZQAPSTERCP«XM 


Z6APSTSRLD.XM 


ZMOTE 


aiUTECP.XM 


ZMWELD.XM 


Z6L0VE 


ZGLOVICP.XM 


ZGLOVELD.XM 


ZTAUOR 


ZmZLORCP.XM 


ZTAILORLD.XM 


ZDEALER 


ZDBALBRCP.XM 


ZDEALBRLD.XM 


ZPHEW 


ZPHEWCP.XM. 


ZPHEWLD.XM 
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• ! 

^- TMMkl Jk7« COMPZW hW> LOAD MACHOS rOH MXXIJUUiy idlTXIItS 



BSOIT 

ESGDIT 

CASP 

GWIZ 

MIND 



CowplI* — cro 

ESOCPvXM 

B8G1CP.XM 

Q8PCP.XM 

GHZCP.XM 

MBliCP«XM 



Lo*d macro 

B80I»D«XM 

GSPLO.XM 
GHZLD.XM 
MINZJO.XM 



ERIC 
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TABLB A8. DATA FIL88 OBLXVBRZO WITH VDGS 



MH8ET* - ««9ic nurhttr Mt fil«'t 

WXZ.ST <- fil« of length and strtteh factors 

PFZLI " prompting file us«d by ZESG In CKAZMMZND 



I 
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TABLE M* COMMAND FILES FOR CHAIHMIWO 

/3-26-79 

/GBNTL - MACRO TO- CREATE EXAMPLE SPACES AMD TRANSITION LETTER SETS 

AHIS IS THE ir^IRST MACRO OF Cltt.lHMIMD. 

DELETE BS$-'. - 

DELETE TR1X-, TM TRLS-. TM 

MESSAGE START PROGRAM ESG 

ZESG 

MESSAGE START PROGRAM GZEC 
ZG2EC 
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/3 -26-79 ' 

^^^^ """^ ^^"^^ ^^^^^ AND TRANSITION OTTER SSTS 

/PHIS IS THE SECOND MACRO IN CIIAINMIND. «r*» 

DILin MC-.TL 

MESSAGE START PROGRAM RESCUE 
ZRSSCUE 

MESSAGE START PROGRAM SXGH ] ' 

ZSIGH 

DELETE MC-. «V MC-. LP 
MESSAGE START PROGRAM LOOPER 
ZLOOPER 

DELETE CDAT-. - CIDX. RV 
MESSAGE START PROGRAM REVBXA 
ZREVSXA 

MESSAGE START PROGRAM RVDIT 
ZRVDIT 

MESSAGE START PROGRAM COVERT 
ZCOVERT 

MESSAGE START PROGRAM INVERT 
2INVERT 

MESSAGE START PROGRAM CROAK \ 
2CR0|IK 

DELETE CDAT-. - CIDX. RV QOAT-. - MUDT-. - RVX.ST 

MESSAGE START PROGRAM REVEX 

ZREVEX 

MESSAGE START PROGRM RVDIT 
ZRVDIT 

•MESSAGE START PROGRAM CROAK 
ZCROAK 

MESSAGE START PROGRAM ADDER 
ZADDER 

MESSAGE START PROGRAM AVRAJ 
''^VRAJ 

•MESSAGtC START PROGRAM CRAP 
ZCRAP 

MESSAGE START PROGRAM G. 
ZGAPSTER 

MESSAGE START PROGRAM SORTPA 
SORTRA 

MESSAGE START PROGRAM SORTRB 
SORTRB 

DELETE GAPMAX QASM.DT GAP.DT 
MESSAGE START PROGRAM GAPSTER 
ZGAPSTER 
DBLETK LOOPY 

MESSAGE START PROGRAM MtJTE 

DELETE AFA APR 

MESSAGE START PROGRAM GLOVE 

ZGLOVF 
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OBLBTE WHAT 

N188A6E START PROGRAM TAILOR 
TAILOR 

MESSAGE STA.rr PR(DGRAM BUILDER 
BUILDER 

MESSAGE START PROGRAM DEALER 
ZDEALER 

MESSAGE START PROGRAM PHEW 
ZPHEW 
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APPENDIX- B 

I * 

\ PBRFQRMAMCE /ANALYSIS SUBSYSTEM USERS MANUAL 

B« 1 PROGRAM* DE^CRXFrXOHS 

Th« following pagei include dascrlptlona of tha four programs vihlch 
comprlM th# Parformancti Analysis Subsystam (PASS) of VXAS* Thasa programs 
ara daslgnad to axarclaa BIGMtNT, tha raaaarch varslon of MXNT# and to 
analyca tha data collac^ad by BIGMXNT# 
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srCMXHT 

Utltt BZ01ZMT« SV 
FurpoMi 

Th« purpoM of BIGMINT l8 to find th€ 10 b€8t explanations 
of •ach uttaranca# as vlawed by the Mint algorithm* 

Printout t 

For each utterance t 

1 ) A table of properties and costs associated with 
each node in the utterance* 

2} The IQGAP matrix # showing %ihlch gap costs were 
computed and the resulting costs* 



3} A table of the ten best paths* 



User Dialog: 



BIOIINT <MKX DAfex PACKETS>/I <OUTPUT riLE>/0 <MIND PIIJ!>/D 

<Mt»lBER or UTTEBAMCES TO PROCEt$S>/N [LISTING FIUSJ/L 



STOP ALL DONE 
Global Switches: 



/A Use STATSUM type MIND file 

/P Print the MIND file 

Note: If the /A option is not used^ BIGMINT 
will create a STATSUM type MIND file 
named ^STATSIM.VD'' • 



Local Switches: 



/T> MIND file of either type 

/I ioput file name of data packers 

created by MEX using the global 

/A option^ 

/o output file of the ten best paths 

for use in STATSUM* Itie output 

file must exist* 
/N the maximum number of data packets to 

be processed* 

optional : 

/L listin*^ file name^ default is the line printer. 
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nput Piltttt 



MIMO fll« 

data P«ck«r film 



" ^TDGS 9«n«rated voice data flle< 
- output fron MSX with 
global /A option. 



Output Filas: 



racognition fila - (-.RE) contains the 1) best paths found by BIGMINT 

for later use in STATSUM. 



Error Messages t 



STOP NO MIND FILE GJWM 



(This occurs %»hen no MIND file is given in the command line.) 
. STOP RECOGNITION FILE OPEN ERROR 

(This occurs when the recognition file specified in the coramand line 
does not exist.) 

NON-MATCHING NUMBER OF MACHINES BETWEEN VOICE DATA FILE:<MIND FILE> 
AND MINT, NAMELY <MINTS »> AND <MIND FILES #> 

(If the number of machine types MINT expects is not equal to the 
number of machine types the MtND file was created for, MINT won't 
run. To correct this, recompile MINT with naw value for 
parameter "MACHN".) 

NON-MATCHING REVISION KEYS BETWEEN VOICE DATA FILE: <MILD FILE> 
AND MINT, NAMELY <MIND FILES REVISION KEY> AND <MINTS REVISION KEY> 

(If these keys are different, it means that the MIND file and MINT 
are expecting different formats of the voice data.) 



STOP - END OF DATA PACKETS 



(This »-<^"'^' .i» when the number of packets to process specified 
in the command line wit^> the local /N is greater than the 
number of utterances in the input file.) 
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STATSUM 

Titl«j STATSUM. SV 

Purpo««t 

The purpose of STATSUM is to break up the decision-making* 
process used in \.he MINT algorithm into its component parts. 
This enables the user to examine the contribution of each 
of the MINT cost functions. 
STATSW has three main functions: 

1 ) To deterntine the recognition error category 
and type t->T each utterance. 

To gather data for later use in SSPLOT. 

3) To gather data for use In LICVAT. 

Printout- 

For each utterance: 

1) The total cost over each path of each cost component. 

2) The category and type of the "toughest critical decision". 

3) The difference in total costs between each incorrect path, 
and the best correct path. 

User Dialog: 

STATSUM <MEX DATA PACKETS>/I <OUTPUT FILE>/0 <MIND F^LE>./n 
(LISTING FILE] A. 

STOP - STATSUM ALL DONE 

Global Switches: 



/A 


use STATSUM t^ise MIND file 


./N 


create the data for SSPLOT 


/P 


print the MIND file 


/Q 


generate the data for LICVAT 


Note: 


If the /A option is not used, STATSUM 




will create a STATSUM type MIND file 




named "STATSUM. VD" , 



Local Switches! 

/D MIND file of •ithar typa 

/I input fll« naiM of data packets 

craatad by MEX using the global 

/\ option* 

/o output file of the ten best paths 

for use in STATSUM. The output 
file must exist* 

Optional! 

/L listing file name, default is the line printer* 

Input Files; 

MIND file ^ (•-•VD) VDGS-generated voice data file. 

data packet file ^ (-.PK) output from MEX with global 

/A option* 

recognition file (_.re) contains the 10 best paths found by 

BIGMINT 

STATSL-M.NM - this file contains the base of 

the temporary files that STATSUM 
will append data to for later 
SSPLOT and LICVAT* 

Note: STATSUM should contain "XXX*** 

where XXX are etny three valid characters 
for RDOS filenames* 

Output Files: 

XXX* 00 temporary files to store data for 
XXX, 01 use in SSPLOT and LICVAT 



XXK, 1 2 

.SSC0fn^4TEP ^ contains counts of gap occurrences 

intrinsic properties 

N-^>»; SSCOtJNTER and the other temporary files are appended to and 
should he deleted and created before each block of data 
^test data, interim test data, etc) that PASS is run over. 
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STOP NO MIND FILE GIVEH 

(This occur* when no MIND file is given in the command line.) 
STOP RSCOGMITION FILE OPEN ERROR 

(This occurs when the recognition file specified in the coeimand line 
does not exist*) 

NON-HATCHING NUMBER OP MACHINES BETWEEN VOICE DATA FILEt<MIND FILB> 
AND STATSUM, NAMELY <STATSUMS #> AND <MIND FILES #> 

(If the nxjmber of machine types STATSUM experts is not equal to the 
n\»ber of machine types the MIND file was created for, STATSUM won't 
run« To correct this, recompile STATSW with new value for 
parameter "MACHN*.) 

NON-MATCHING REVISION KEYS BETWEEN VOICE DATA FILE: <MIND FILE> 
AND STATSOM, NAMELY <MIND FILES REVISION KEY> AND 
<STATStJM*S REVISION KEY> 

(If these keys are different, it means that the MIND file and STATSUM 
are expecting different formats of the voice data.) 

STOP - NON MATCHING PACKETS AND RECOGNITION - PATHREAD 

(This occurs when the MEX data pac)cets and BIGMINT recognition 
data were created from different data*) 
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SSPLOT 

Title? SSPLOT ♦SV 
PurpoMt 

Th# purpoM of SSPLOT it to plot cumulative distributlone 

of each of the individual cost conponente used in the MINT 

algorithm, to evaluate the usefulness of each cost function 

as an information source # and to give a list of each interesting 

gro\jqp (category and type) of errors for several magic number sets« 

Printout: 

For each category and type and cost of interest: 

1) An ordered list of the utterances and costs differences used 
in each plot 

2) A cumulative plot of the costs differences {if possible) • 

3) The amount of information contained in this cost. 
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OMr Dialog: 

SSPWT 

ENTSR OBSCRIPTION OP PLOTS 

00 YOU WANT TO PLOT ALL COSTS (Y OR N)? 

(A "Y" ana%i«r caus«« a saparata plot to ba mada for aach coat 
co«ponant, an "N" will gat tha following quaationO 

ENTER COST » (1-12) THAT YOU WANT PLOTTED 
OR -1 TO END 

(Tha uaar may entar any or all of tha costa, ona at a tima. 
Aftar each nuabar entarad tha quaation will be repeated 
until a "-1* is entered*) 

DO YOU WANT A PLOT OF ALL CATEGORIES (Y OR N)? 

(If you answer "Y* all categories will be plotted and tha next 
quaation will ba skipped. If you answer "N" the following 
queation will appear.) 

ENTER THE CATEGORY TO BE PLOTTED OR -1 TO END 

(Hare you enter the categories you want, one at a tima the program 
will repeat the question after each entry, until you antar "-1* 
to and.) 

DO YOU WANT A PLOT OF ALL TYPES IN CATEGORY 1 (Y/N)? 

CY* gets a separate plot for each cost for insertions, deletions, 
and substitutions, an "N" geta the following question:) 

ENTER TYPE: 0 « (0,1), 1 - (1,0), 2 - { 1, 1 ) , -1 • GO ON 

(You enter -1, 0, 1, or 2 depending on the type you want. 
This question will also repeat until you entar -1») 

00 YOU WANT A PLOT OF INCORRECT RECOGNITIONS ONLY ON THE 
SAME AXIS? {YAi) 

(Enter "Y* or "N" ) 

ENTER THE SCALE OF THE PLOT 

(Enter an integer (10 is a nice number) for the scale of the 
cost axis of the plots.) 

STOP - SSPLOT ALL DONE 
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Global Switches I 

/A This %flll caus0 tv^ry pQMlbl# cost# category # typ«^ 

plot to hm done with Incorrect only on same axis 

and a scale of 10, and will eliminate all of the above 

questions* 

Input Piles: 

STATSW^NM - This file contains the root of the 

temporary files (••XXX*-'*) used 
by SSPLOT* 

XXX*— STATSUM-generated data files* 

Output Piles: 

None 
Error Messages: 

None 
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LICVAT 

TitUt LICVAT^SV 

vurpoMt 



Th# purpose of LICVAT is to test certain assumptions about 
the distribution of some of the properties and costs used * 
in LISTEN* Namely, the occurrence of each violation categoryt the 
L-coiinter costs# the inter-word gap lengths # and the frequency 
of association between each pair of machine types ♦ 



Printout: 



1) A table of counts of violation categories by machine type 
for real recognitions and artifact nodes* 

2) A t^ble of association counts machine type by machine type 
for both reals and artifacts. 

3) A cumulative plot of a function of L-counter 

values designed to produce a rectangular distribution 
for reals and artifacts* 

4) A table of start gap data by machine type 

5) A table of end gap data by machine type 

6) A cumulative plot of a function of gap values, designed to 
give a rectangular distribution for reals and artifacts. 



User Dialog: 

LICVAT 

STOP ALL DONE 
Global Ssitches: 



/C print the counts of violation categories and associations 

A. print the cumulative plot for L-counter data 

/Q print the start-end gap count data and the plot 

of the adjusted gap function 

Note; If no global switch Is given, UCVAT will do nothina 
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SSCOUMTSR 

STATSUM.tM 

QLSTATS 

XXX.- 
Output Ftl«a: 

None 
Srror Messages t 



- the eccuBivdAted counts for violation, 
association, and gap datai STATSUM produced. 

- holds the root of the name for the temporary 
files (XXX.-) 

- created by MOTS, this file contains data 
necessary to compute the L-counter plot 

~ the temporary files created by STATSCM 



None 
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